DOCOHBNT HESOHE 



ED 118 165 

aOTHOR 
TITLE 

INSTITUTION 
PUB DATE 
NOTE 

EDRS PRICE 
DESCRIPTORS 



JC 760 089 



IDENTIFIERS 



ABSTRACT , 



Povell^ Robert § 
Grading Style and Student' Evaluation of Faculty* 
'William Bainey Harper Coll., Palatine, Ill«- 
Apr 75 

•57p. ^ ^ 

HF-$0.83 HC-$3.50 Plus Postage ^ . 

Annotated sil^liographies; Correlation; Evalua^o^n 
Criteria; *Grades (Scholastic) ; Junior Colleges; 
♦Literature Reviews; *Post Secondary^ Education; 
Student Attitudes; ^Student Teacher Relationship; 
♦Teacher Evaluation , 
iilliam Eainey Harper College 



. This paper discusses the associatibn between studjbnt 
grades and student ratings of faculty. The first section ^vi^vs a 
197^ study of Harper College English teacher ratings, whj^cn showed a 
correlation of .73 between the grades the teachers gave students and 
the ratings students .gave the. teachers,. The second section reports 
the, findings of a 1975 replication study which showed grade-rating 
correlations of up to .79. The third section firovides a review of the 
literature in the form of an annotated bibliography, indicating that 
the Harper findings are typical of the findings of prior research at 
other colleges. Twenty-eight studies involving more thaji 70,000 
student ratings of faculty in more than 50 colleges and universities 
have been conducted and published since 195*^. In every study, at 
least some association has been found between grades and ratings, and 
in a number of the studies, the association has been found to be 
quite powerful, with correlations ranging up to .90. The fourth 
section of this document discusses the implications of the findings, 
concluding that the widely-held belief that grades and ratings are 
unrelated is a myth, relying for its support on studies conducted 
more than 20 years ago— studies. that are weak in design and 
execution, and sometimes* less than candid in reporting the data. 
(Author/HHM) 



.♦ Documents acquired by ERIC include many informal unpublished 

♦ materials not available from other sources.* ERIC makes every effort 

♦ to obtain the b^st copy available. Nevertheless, items of marginal 

♦ reproducibility^ are often encountered and this affects the quality 
*♦ of the microfiche and hardcopy reproductions ERIC makes available 

♦ via the ERIC Dpcument Reproduction Service 4(EDRS). EDRS is not 
.,♦ responsible for the qual^ty\ of the original document. Repltoductions ♦ 

♦ supplied by EDRS are the best that can be made from the original. ♦ 

— . — . 



ERIC 



-0 



U S OEPAHTMENTOF HEALTH. 
EDUCATION *WeLF Ate 
NATIONAL INSTITUTE OF 
EDUCATION 

THIS DOCUMENT HAS ftEEN REPRO 
DUCEO EXACKY AS RECEIVED PROM 
THE PERSON OR ORGANIZATION OR 'GlN. 
ATiNCTlT POINTS OF VIEW OR OPINIONS 
STATED CX> NOT NECESSARILY REPRE 
SENT OPFtClAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY 



GRADING STYLE • \ 

AND 



STUDENT EVALUATION OF FA.CULTY 



i 

\ 



Robert Powell 
William Rainey Harper College 

. April, 1975 

\ 



GRADING STYLE AND STUDENT EVALUATION OF FACULTY 

This paper discusses the association between student grades and student 
ratings of faculty. It Is organized as follows: the first section re- 
views the findings of a 1973-74 study of the ratings of Harper College 
English teachers; the second section reports the findings of a just-com- 
pleted 1974-75 follow up study; the third reviews the literature of the 
field to- determine If Harper College results replicate those of other 
colleges; the fourth discusses the iiiq)llcatlons of the findings. 



A. 



T^e?lS.73-7.4 Study 

The original study, made at the end of the fall, 1973 term^appeared In 
March 1974 under the heading "Evaluation and Student Grades". It showed 
Strong' associations between student grades and teacher ratings for . the 
18 full time Harper English faculty^ members. The coefficient of correlation 
between grades and ratings as computed by a statistical formula called 
Spearman's rank order was a high ,73. The chances of the finding being 
accidental was determined to be less than one in 100,* 

The strength of the association is illustrated by the following data. The 
5 teachers who received the highest mean student ratings (above 4.19 on the 
5 point rating scale then used) had given an average of 32% A* s and 37% B*s 
to their students. The 5 teachers who ranked lowest (below 3,93 on the 5 
point scale) had given an average of 9% A's and 23% B's. The mean g:^ade 
point average assigned to students by the 5 highest ranking teachers was 
2.90, just below B on the 4 point scale. The g.p.a. of the 5 lowest ranking 
teachers was 2.06, just above C. 

'A 

The evidence pointed to a powerful relationship between grades and ratings ♦ 
It suggested that those teachers who insisted on conservative grading ^ 
standards might be at a very considerable disadvantage in competing for Vi 
merit raises, promotions and sabbaticals if student evaluation of faculty^"^; 
continued to play a role (as the Harper Board of Trustees Insisted it shot^d) 
in the collegers faculty evaluation system. The report thus suggested tha|'. 
computer-summarized faculty rating scales were perhaps of questionable vali^^ity. 



*For an explanation of the statistical notations used in this paper see 
"A Statistical Note" ~ Appendix E. 



in spite of vhat was assxiroed to be massive evidence to the contrary, and 

that their contin^aed iise in a cccnpetitive faculty evaluation system might 
set VP a jockeying for position among faculty members that could affect 
the standards of the college and could lower rather than raise the quality 
of teaching. 

The 1974"'75.StU(^ 

THE SAMPLE 

The present study attempts to determine whether last year's findings would 
be replicated with a new and standardized teacher evaluation form, the 
nationally distributed University of Illinois C.E.Q, form having replaced 
the Harper ccramittee- created form used last year. The conditions of 
rating were also more controlled. Last year, though the student ratings 
were anonymoias, the instructor administered his own ratings and tiimed 
them in to the Division office. This year the anonymous ratings were 
placed in sealed envelopes by students, and the instructor did not see 
them until the semester was over. In addition, this year a statement was 
read to the student telling them that the ratings would be used as evidence 
for promotions and pay increases, in last year's ratings a few students 
might not have been aware that the results would be used for personnel 
purposes. The more rigid controls are iirport:ant. Research studies by 
Aleamoni arid Hexner CEric ED 081,405, 1973) and others havedemonstrated that 
evaluations tend to be higher when (1) the instructor administers them and 
C2) when the students are aware that they will affect the instructor personally 
or professionally. 

Of the 1^^ full-time English teachers in the college 16 made available their 
confidential ooirputer printouts summarizing the results of student evaluations 
taken near the end of the fall term in December, 1974* In all, printout of 
35 of 40 evaluated sections were voluntarily made available and axe included 
*in the study. The college evaluation system requires that teachers be 
evaluated by at least two of the four or five sections they teach each 
seioester. The division diairman selected tl>^ classes to be evaluated. Most 
of the teachers are thus represented in this, stu<5y by two classes, but two 
teachers were evaluated by three classes an^ oner^by fqur. One teacher is 
represented by only one section. 



THE DESIGN: COMPARING HIGH AND liOVTTSRADEPS 

The 16 English teachers in the study fall into two s^arate and easily 
identifiable groi5)s according to theil: habitual patterns of grading. 
Column 1 of Table A shows the mean grade point average given by each 
of the teachers to all their students in all classes in the fall term 
of 1973. Colxmn 2 shows the same data for fall 1974. Column 3 is an 
average of the first two ooluans. It is the teachers grading style index 
and is the basis of the 1 thru 16 rankings shown in Column 4. Columns 5 
and 6 show the percentage of A's given by eadi teacher in fall of 1973 
and fall of 1974 respectiiTely. 

The two groupings are apparent from the table. "High Graders", teadiers 
1 throug^i 8, assign average grades from 2,50 upward, the top of this group 
having grading style indexes just above and below B. They are generous with 
A's, the median for the gzotp being 29%. "J^dw graders", teachers 9 throu^ 
16, assign mean grades of from 2*50 downward to j\jst above and below C. 
They are stingy with A's, their median being 12%. 

The department has low turnover. The teachers are experienced. The grading 
pattern changed little between 1973 and 1974, even though an "N" (not 
ooicpleted) grade counting as 0 in teacher grade point averages was added 
to the A thiou^ F system in 1974. A dieck of grades for 1972 and 1971 
reveal the saiae patterns, though the departmental grade point average has 
risen. Hij^ graders remedn hig^ and low graders low. The only movement 
between giroups occurs at the very bottom of the high groxgp and the top of 
the low grovc). 

^ The grading style index is not influenced strongly by the type of course ta\aght 

or by time of day the coiarse is given , thou^ literature courses and other 

* electives tend to be graded scanaewhat hi^er. The department mean .^iverage for 

evaluated literattire courses is 2.70 and for English. 10], 2.52. All the 
teachers usually teach three of fo\ir composition ootirses and one literature. 
There are no caramon fineil exams. Thou^ some sections naturally tend to be 
of higher ability than others and would tend to receive hi^er grades, theix 
effect on the grading style indexes is thou^t to be equsil for all. Grades ♦ 
, are based exdiisively on the instructors own opinion of what the student 
has earned. Habituedly that opinion differs between those who belong to 

l^Y^C "^^^ "hi^ gprader" grotp and those who belong to the" low." 



FINDINGS : 

Item 9 on the C.E.Q. asks th^ students to rate the instructor overall 
performance on a 6 point sceile: l--very poor, 2-poor, 3-fair, 4-good, 
5-veiy good, and 6-excellent. The computer adds all^sJfeudent ratings in 
each section and prints a inean section rating. These section means 
beoomd a statistic on the charts used by peer and administrative evaluators. 

Table B shows the 35 section means in two columns. The left column lists 
the 18 courses tau^t by the 8 hig^ graders. The right hand column shows 
the 17 courses tau^t by low graders. The median section rSttChg of, the 
-iji^ grader was 5.32, the median score of the low graders was 5.00. Only 
one class tau^t by a low grader reached the median of the hig^ graders . 
The average section of the hi^ graders was 5.22, the average of the low 
graders 4.78. Every hi^ grader except one placed at least one section at 
5.22 and above, and the one who did not make it was close, at 5*17* Four 
sections taught by low graders reached that level. 

The students were asked to place the grade they e3Q)ected to receive in the 
course on the rating form. An expected grade mean could thus be conputed 
for each section. Actual final grades assigned by the teacher at the end of 
the semester were also available. The average mean section ej^ected grade 
for the low graders was 2.86, for the high graders, 3.13. The average mean 
section final grade for the low graders was 2.33; for the hig^ it was 2.80. 
It is significant that the four sections of low-grading teachers that reached ^ 
the hig^i-grader mean of 5.22 show both eatpected grades and final grades well 
above the teacher's usual pattern. The average expected grade for those four 
sectioi^s was 3.07 and the average final grade, 2.65. The final section 
grades are 40% to 50% of a grade level above the teachers* grading style index. 

Coefficients of correlation conputed by the Peatrson product moment method were ; 

Mean teacher ratings in 35 sections with: 

Student expected grade mean: •49 
Student final grade mean: .43 
Teadier grading style index: ♦58* 

English 101 section^only with 

eaqpected grades .54 
Literature & electives with 

expected grades -46 



ERIC 



/ 



The Speazman rank order correlation between teacher grades and ratings in 
each teacher's highest rated class are shown on Table C. The rank order 
correlation between mean section rating^ and teacher's grading style index 
is •TS.* Between the ratings and section fined grades, it is .79, 

The 1973-74 stu(^ findings were obviously replicated in the 1974- 75 study • 
Unless the Harper correlations are a one in one hundred statistical accident, 
grades, or whatever grades synibolizer have a very important association with 
teacher ratings, perhaps accoxaiting for one-third to two-thirds of the 
differences in teacher ratings. 

y 




1- 

/ 



*0f the five evaluated English sections not made available to this study, 
fotir were taught by low graders. Their grading style indexes and rating 
ranges are known. If it had been possible to include them in the stu^ 
they would have increased the grading style index correlations. 



Conflicts With Expert Testimony 

The evidence from the replicated studies thus Indicates that In the English 
area of Harper College there Is a contlxmlng relationship between the grades 
a student receives and the ratings he gives his Instructor at the end of the 
course* The findings, however, run counter to what expert testimony has 
predicted would be found* In the past several years the college has brought 
In no fewer than six outside consultants to help It establish a system of 
student evaluajtlon of faculty. All seems to have Ignored or played down the 
effect of graces on teacher ratings* In doing so they were supported by 
a large body/bf literature, their own and others*, that repeatedly states 
that no ducQ/assoclatlon exists* 



Por exaJLDjXe, W* J. McKeachle, one of the consultants brought to Harper, 
said 111 4973 In the Proceedings of the First Invitational Conference on 
Faculty Effectiveness as Evaltiated by Students : 



"The classic research on most aspects of^ student ratings 
of Instructors was carried out by Henry Remmers and his 
students at Purdue, His results are still largely un- 
challenged by more recent research* Among the factors 
which did not significantly affect ratings were such 
student characteristics as: | 

Veteran/non-veteran statud 
Age 
Sex . 

Class standing 

Grade In Course (However when the top students 
achieve more than expected they 
rate the course higher, and . 
when the poorer students do 
better than expected they rate 
the course higher*)" 

Kenneth E* Eble, another Harper consultant, says In his 1972 book. Professors 

as Teachers : . ^ 

"Scrutiny of thousands of questionnaires at perhaps the 
easiest point for testing the popularity hypothesis— 
the correlations between favorable grades and favorable 
^ responses— -repeatedly shows no- correlation*" 

Professor Eble, the former director of AAHP Project to Inpi^ove College, 

Is probably the best known of the contemporaify authorities on student 

evaluation of faculty* . 




8 



/I 
/ 1 



7 



Charlotte Epstein, writing in the April, 1974 isaxie of the Coroaitinity 
a nd Jmioicollege Journal asks - a question ; 

How do the perceptions of faculty ooitgpare with the 
findings of sdiolarly research on the validity of ^ 
student evalxiations? 

She took a faculty poll at her cooDnnunity college to find the answer. She 
reports c 

In most cases faculty attitudes do not agree, 
Research findings , for example, do not s-qpport 
the facility -view that student ratings are eiffected 
by grades, clsuss size, or whether or not the 
student is majoring in the discipline. Nor do 
the faculty seem aware of a body of research 
which cites student ratings as remaining ur^r 
affected by the sex of the instructor or th^ 
students ^ grade point arverage, i 

The testionony of authority has thus been quite strong this area. 

RLchard !• Miller in his book. Developing Programs jfor^aculty Evaluation ,(1974), 

the most conqplete and scholarly work in its field, come|r ;to the conclusion 

\\\ , 

that grades and ratings are only marginally relate^, if «t all, quoting the / 



previous review findings of Ctostin, Gireenouc^ and Menges^^ the University 

of Illinois, published in Iteview of Educational Research djh 1971. Lawrence 

Aleamoni, still another consultant at Hazper, said in an ^Iress delivered at 

the Synposiiom on Methods of Improving University Te^adiing j|eld at the Israel 

Instittite of Technology, Haifa, Israel in 1974; : |Vs 

In almost all til© studies cited in Costin, et al;t (1971) 
and by inirestigators such as Guthrie (1054) , Remmers 
(1960) and Weaver (1960) little or no relationship: 
licis been foxmd between a student's grade and faculty 
rating.' In fact, the positive correlations seldom, 
exceed .30. The evidence^ therefore, indicates that 
students do not necessarily rate an instructor or \ 
course based upon the grade they harve or are about 
to lecei^. ? 

Mr. Aleamoni is with the Measurement and Research Division of | the Office 

of Instructional Resources of the University of Illinois, publishers of 
the Illinois, Course Eval\iation Questionnaire (.QSQ) . ^ / 




ERIC 



9 



It is no Wonder that: authori/cies in the field, and administrators who have 
been in contact with tjiem aroioetiines eapress irritation at those faculty 
©exrtoers who, operatiing fxym a gut feeling, insist that a quantitative evalua- 
tion system forces them tp play to the wishesof those students who are 



least ambitious and leaaft able. Professor Eble, who is both authority and 
administrator, expresses sudk. irritation vrtien in the January, 1974 issue 
of College English ^ he /attacks an article by Evelyn Kossoff, which had' 
appeared in the wintei/ 1971-72 edition of The Aroerlcan Scholar. Ms, Kossoff, 

whom he calls a "former English teacfier", had criticized ratings from a 

*-> 

philosophical rather than an empirical pdint of view. After calling her 

article another of the ''eruptions of ignorance*' that he keeps confronting 

/ 

in respectable places, he says: 

The basis of infonaation from which Ms. Kossoff 's 
corticle proceeds ijT Cllfnot long ago I saw a 
questionnaire," C2) "anorher widely circulated 
evaluation questionnaire and (3) two survey 
corticles in 1953 and 1,963 general reference 
works ^ These soxirces offer as little informa- 
tion atxDut evaluation as "there* flashed throu^ 

inind the picture of one professor who...'* 
(MS. JSossoff's words) affdrds about the nature 
of effective teaching. It is as if she set out to 
q\2estion the validit;^ of current cancer research 
by citing a pamphlet picked xp in a chiropra,cter's 
office and. an article in a 1953 ^cyclopedia. . • 
All this is bad enougpa as Tneas'ured\i>y any standards 
of scholarship, but it is worse when. one considers, 
that a writer working within a university community 
mig^t come across one or more of the fallowing; 

Processor Eble ^en goes on to ejtplain that there weie 50 iteins that might 
havei come-to Ms. Kossoff's attention, inclu<^g "the existence of th^^ 
University of Washington office of student evaluation since 1925; and the(^ 
fact that I (Eble) had been on her cacnpus (U. of Kentucky) the previous 
October" talking with a caxtpus-wide audience about evaluation.'* He 
continues I "That is why I turn to willful ignorance as an e35>lanation of 
this kind of iii5)erviousness to information on a subject important to oollegje 
teachinW* . *The examined life is held as a scholarly ideal as long as it 
steps short of examining teaching. " 



10 



/ 

/ 




One can well understand Professor Eble^s frustrations, as he e^cpressed them 
in the College English artfoLe, which he entitled "l^at Are We Afraid Of?" 
He has worked long and hard in the field, making some progress, as testified 
to by his statement in Professors as Teachers: recent 'inquiries suggest 

that tl^ use of systematic student ratings h%^ greatly increased since 1966 

Oollecting the Data 

The fact remained that in the Harper College English courses, grades and ratings 
were so closely connected. that the wisdom of continuing to use quantified 
evaluations could be called into question. Even though e35)ert testimony 
pointed to error or statistical accident as the cause of the Harper findings, 
^ it Was thought best to look at the past research done in other colleges for 
clues' as to why the Harper findings were different. It wai^ deci.ded to do a 
thorough pob, to avoid the easy haKit of picking up studies in chiropracter's 
offices or using exanples of the "there flashed throu^ ity mind a pictTore" 
type. It W£iS decided to do the most thorough job yet attempted in this field — 
to locate and sunmarize every original*source research study published since 
1930 that focused in whole or part on the student grades -teacher ratings topic. ' 
It Was decide^^^^ look closely at the research, not just at the conclusions 
the researcher reached, but to study the intention, sample, design and execu- 
tion of the Work as well. 

The collections of two large imiversity libraries were seai?ched. An ERIC 
ccopputer, search was run. More than 200 review studies were read for their 
references and their bibliographies. More than 75 studies were xeroxed and 
sxmmiarized. Many turned out to be secondary sorurces or student achievement 
studies, but 41 seemed to meet minimum reqtrLrements, that is the author had 
looked at a boc^y of student ratings in a plaimed way for the specific purpose 
determining if grades and ratings were related, and ' had said something 
about the size of his sanple, ^his method of examining the data on the, iresults 
of his investigation. \^ 

/ ^ ^ 

It beceatie plain early in,, the search that many studies listed in, the review 

literature as showing that grades and ratings were unrelated Jjid^affly epcAig^i 

siibstance to malce the list of 41», jrheir authors had made no serious a)2xempt to 

coitpare one teaciierJLs^resuIts with another or they had iised as sanp^s teaching 

as^'tstitiits^o had no say what grade the i student earned or the/ had used 

ERIC / - 1 11 



10 

questionable research designs. These weaknesses were not ^parent when one" 
read only the researcher's conclusion, but becaine obvious when one went deeper 
into the study. 

Table D lists all of the 41 eit^irical studies found in the search; It is non- 
selective. No study has been omit^d. . Those studies that seem to ine6t minimum 
reseeurch standards are marked with an asterisk* The studies cure arranged chrono- 
logically from 1930 onward in Column 1 of the table. Cblumn 2 shows the number 
of students / teachers or sections involved in the stu<^. Columns 3, 4 arid 5 
show the strength of the grade-rating associations found by the researchers • 
If the authors of the study concluded that the correlation between grades and 
ratings was negative, nil or negligibly positive, the results are entered under 
Cblumn 3. If weak to moderate associations were found, the results are found 
under Col\iran 4. Finally if marked or strong associations were founds or if 
the author believed the assocation he foxmd to be qdj:e iijportant, it is entered 
under Column 5. 



am ^a glai 



It's quite obvious from^a glance at the table that those who maintain that re- 
search has shown that there is no relationship between student grades End student 
ratings of faculty seenl to be right up to a p^i»t*<L953 to be e^^^m^^ut this 
Paper will hereafter show that they were n6t ri^^t even to 1953. It's quite 
obvious that they are not correct from 1953 onward. Of the 28 studies conducted 
since 1953, six showed negligible correlations, 10 showed low to 'moderate correla- 
tions and 12 shewed marked to strong correlations. . 

Because of the apparent conflict between what Professor Eble and others who work 
in this field have said aboiit the iSlationship of grades to ratings and Table D, ^ 
it will be worthvrtiile, thou^ time-cpnsuming to look more deeply at every* one 
of the studies so that better judgments can be made about what they signi^. 
All 41 are summarized on th.e following pages. The summaries begin with the 
studies, all made from 1953 onward, that are listed in Col\mans 4 and 5 of Table D 
and show at lecist a low relationship between grades and ratings. After that the 
Qtji^es that showed negative, nil or negligible correlations eure summarizetb^ All 
b\xt six of these were conducted before 1953. ^ \ 

It ahoiald be remembered that some of the studies were concerned not only with 
grades and ratings, but with other facets of student evaluation of faculty as well. 



12 



11 



In most' Itistances, the flndltigs oa effect >df class sl^e, time of day of 
^lass and the like have been omitted. . Focus on the key issue— the effect 
of grades on the validity of the scales. 




Some studies were surely missed in the\e'arcl>r 
bibliographies could not be foxxnd. Those .unpublished studies that raiist lie 



d several listed in 



in filing cabinets at various colleges coul^upt of course be Included.^^ 
Stillv it is felt €hat the 41 studies are the most complete^ .collection yet 
assembled and give a comprehensive picture of eimdxfcal research up to the 
appearance of mid 1974, periodical indexes. 






■ ■ o , 
.ERIC 



13 




Studies Showing Positive Grade-Rating Correlations 
The first research to show a relationship between student grades and teacher 
rating was descrlbed^ln the article by A.k* Anikeef, appearing in 1953. 
It followed a quarter century of studies that, unanimously insisted that 
grades were not important to ratings* The very considerable prestige 
of psychologists H.H. JRemmers at Purdue and E.R. Guthrie at the Oni^ersity 
of Washington supported the no grade-bias position, Anlkeefs' was the 
watershed. Since it appeared the great bulk of the original research has 
shown that grade bias exists. . The research that has not-f^will be shown in 
the next section to be of questlonable^beW^^^rabl^^ Anikeef^s study and the 
Column 4 and 5 studies that followed it are summarized below. The numbers 
that precede the author* s names, and the study title identify the study 
position on Table -D* Anlkeef 's is the third 'study from the year 1953. 



1953-3: A.M. ANIKEEF: "Factors Affecting Student Evaluation of College 

Faculty Members." Journal of Applied Psychology , 37, No. 6, 1953. 



Anlkeef studled"^i400 ratings of 19 Instn^tors in the School of "Business 
and ^I^ndustry at Misslssl|)pi State College* He found a correlation of .73, 
significant at the ,01 level, beb?^M grading leniency scores* (the 
ueat)( gradp point average assigned by the Instructors) and. the ratings of their 
Instructors by freshmen and sophomoVes. For junior and seniors he found 
a correlation of.43, ^hlch he did not claim to be statistically significant — 
a correlation of .48 b^ng needed if significance is to be claimed when 
working with the number o^^teachers in his atudy. The combined freshmen-senior 
correlation w^--^3^.^ Axxi|cee!l^ concludes that 53% of the variance in freshmen- 
sophomore ratings^&nd ,25% of th^ combined freshmen-senior ratings could be 

attributed to grading leniency. . ^ ^ 

, Comment ; The deslgn\Anlkee£:.used is similar to the one used in 

tfie Harper English studies. TCi& Spearman tank order correlations found are 



similar. A merit i>^y syst^wa 



use at Mississippi State too. 



14 



.13 

1960-1: CARL H/ WEAVER: "instructor Ratings by College Students" 
Journal of Educational Psychology , 51, No. 1, 1960. 

The article reports a study of 699 student ratings In 39 sections of 

history, English, personnel and speech taught by 12 different instructors 

at Central Michigan Unvirsity. The teachers were not coliipared as in the 

Anikeef study, Ixistead, expected grades listed by students on the Purdue 

rating forms were pooled. Expectant A's gave mean ratings of 96.10, B's 94*56, 

C's 91.15, D's 84,63, The differences were significant et^the .001 leveX 

of conf iderice. The author suggests that grade bias is of importance 

in interpretfeg ratings. ^ ^ \ \ ^ 



1964: PAUL P. ECHANDIA: "A Methodological Study of Factor Analytic* 

Validation of ^Forced Choice Performance of College Accounting Instructors. 
Dissertation Abstracts , 1964 (25) (4) 2605-2606. 

> t . . 

Studying 546 accounting students of 16 teachers at New York Unliversity, 
Echandia found that students who recieved higher grades in the course 
rated their teachers significantly higher on factors concerned with course 
organization and lucid exposition. Motivational factors were not significantly 
correlated with grades. No correlation figures are given in the abstract. 



1965-1: R.E.^ SPENCER AND W. DICK: Reported in "The Illinios Course Evaluation 
Questionaire: A Description of Its Development ans a Report of Some 
of Its Results." by Lawrence Aleamoni and R.E. Spencer In Educational 
and Psychological Measurement , 33, 1973. 

Sample: 600 students in two courses at Pennsylvania State rating their 

instructor using the Illinois Course Evalution Questionaire (CEQ) developed 

by Spencer. Whether the two courses had more than two sections or two 

instructors is not stated. Finding: "Course grades and rating scores 

did correlate signif it^aritly (even though magnitude of the correlation 

was small) with all the subscores except the instructor rating.*' Exact 

correlations are not given. 



14 

1965-2 R.E. SPENCER AND W. DICK C1965-2). Same sources as 1965-1 above. ,', 
Sample: 160 students In 12 sections of Speech 101 at Pennsylvania State, usiiig 
the CEQ form* Findings: grades on six speeches and three tests all 
correlated impressively with ratings — ,,85 for speeches, ,86 b» .91 for each 
of the tests* 

Comment : These two studies were apparently reported first in the 




5 edition of the Manual of Interpretation for the CEQ by Spencer and 
Dick» . The 1972 edition of the manual, by Lawrence Aleamoni, seems in 
its 64 pages to contain no specific reference to the Spencer and Dick 
studies or to make any mention of a relationship between grades and ratings* 
Aleamoni and Dick do, however, in their 1973 article in Educational and 
Psychological Measurement say, "it can be seen, then, that in some courses, 
student opinion about the course is highly related to success in the course/' 
The CEQ form is used by the students at Harper College* 

"^1)866-2: CLIFFORD T. STEWART AND LESLIE F. MALPASS* "Estimates of Achievement 
and Ratings of Instructors." The Journal of Educational Research ^ 
Vol. 59, No. 8, 1966. 

Sample: 1975 students rating 67 instructors teaching 53 courses at the 

University of South Florida. Findings: "Highly significant relationships 

were observed between estimated course grades and ratings of instructor^ 

variables." These included strong associations between expected grades and 

approval of the teachers grading policy. The relationships were significant 

well beyond the . 001 level. 

1969-1: B. DAYLE WALKER. An Investigation of Selected Variables Relative 

to the Manner in Which Population of Junior College Students Evaluate 
Their Teachers . Dissertation Abstracts ^ 1969^ 29 (9~B), 3474. 

According to the abstract, 1447 students of 30 teachers at Lee Junior College 

rated their teachers on the Purdue Rating Scale. No statistical correlations 

are given, but th^ abstract says, "Students tend to rate teachers in the 

I 

^ direction of their stated anticipated grades." 

ERJC , ^ 16 



15 

1970-1: J N^RDBENSTEIN AND H. MITCHELL: "Feeling Free, Student Involvment , ' \. 

and Appreciation." Proceedings of the 78th Annual Convention of \ 

the American Psychological Association , 5> 1970. ] * 

Sample: 1655elementary psychology students at Purdue in 60 sections. 

Results: Class grades earned up to the date of the rating correlated .14 

to appreciation of instructor and .30 to appreciation of the course. Final 

course grades correlated .09 to appreciation of instructor and .44 to 

appreciation of course. . ^ 



1971-1: DAVID S. H0I21ES: "The Relationship Between Expected Grades an<i 
Students* Evaluations of Their' Instructors." Educational find 
Psychological Measurement , 31, 1971. 

Holmes studied ratings by 1539 students in 7 large lecture clatsses with 
enrollment of more than 100 at the University of Texas. Grading w^ by 
objective exams. He found statistically significant but small relationships 
between expected grades and two of the three rating subscales. The series of 
items gathered under the heading "Student Stimulation" were all associated 
with expectant grades^ as were most of the items uder the heading "Interaction- 
Evaluation." However, only one item under the heading "Instructor Presentation" 
was found to be mode^'tly related to grades. The mean amount of variance 
shared by grades and key evaluation items was found to ,be 5% and the 
maximum 13%. ^ 



1971-2: RICHARD G. WIEGEL, E.B. OETTIUG AND DC»JALD L. TASTO: "Differences 

in Coui^se Grades and Student Ratings of Teacher Performance." Schopl 
and Society > 99, 1971. 

At the beginning of their study Wiegel and his associates say "... reports 

dating as far back as 1928 have shown there to be only a Jiegliglble relation- 

ship between course grades and the teacher performance evaluations." They 

then describe a study of the ratings df 4 teachers by 331 students in 7 

psychology sections at Colorado State College. They found a strong positve 

correlation, significant at the .01 levdi^, in scpe classes and not^^i ^ 

17 



16 

Others. .Pooled ratings for the 7 sections shpwed positive correlations, 
alsoslgnlf leant at the .01 level. They conclude "... even though large 
correlational studies Indicate that students* grades and evaluation of the 
teacheA are not Importantly related, this relationship should not be dismissed 
lightly. The effect is likely to be Idiosyncratic for both teacher and « 
course, and should be considered in planning of interpreting teacher evaluations. 

1972-1: R.B. BAUSELL AND JON MAGOON: '^Expected Grade in a Course, Grade 

Point Av^erage and Student Ratings of the Course and' the Instructor." 
Educational .and Psychological yeasugement , 32, 1972. 

Bausell and Magoon exSmined 12,000 ratios taken university-wide at the 

University of Delaware in fall, 1969. They report . . th'e present study 

found strong consistent biases in both instructors and course ratings 

which can be traced to (a) the grade the .student expects to receive and 

(b) the discrepancy between the students' expected grade and his G.P.A. . 

The relationship between the G.P.A. and rating alone Is negligible, and should 

not be considered an important source of bias." The coefficient of correlation 

between expected grades and ratings was found to be .62 and between discrepent 

grade and ratings .53, significant bej^pnd the- .001 level. 

1972-3: ALAN NICHOLS AND JOHN SOPER: "Economic^ Man '.in the Classroom." 
Journal of Political Economy . 80, Sept-Oct, 1972. 

Nichols and Soper studying 339 social science section^, at Central Michigan ^ 
University in fall, 1970, compared section mean expettfed grades and section 
mean Instructoj: ratings and found a correlation coeffdlent of .53. They 
suggested that the university *^s grades were again on the rise following the 
introduction of a danpulepxyevaiuation-by- student system* They also suggest 
that by raising the mean grade point average of a section ^ half grade 
level, an Instructor could expect a half grade level rise in hi ^ mean 
section ratings. • ' \ i 

18 i • 



17 

1972-4: W. ROBERT KENNEDY: "The Relationship of Selected Student Characteristics 
to CotaponentS" of Teacher/ Course Evaluation Among Freshmen 
English Students at Kent State Uniiversity." Paper & Sy^mposium 
Abstracts of the 1972 Meeting of the American Educational Research 
Association, ^ ' 

Sample: 549 freshmen in English 160 at Kent State University, Pall, 1970. 

Findings: grade point averages, final grades and expected final grades all 

correlated significantly with teacher ratings. Student ability, as measured 

by. A.C.T. scores, did not seem to be related to teacher rating. 

Comment : This is the third study available only in abstract and 

specific figures arelacking. The tone of the abstract suggest quite strong 

associations between grades and ratings, but it and the other two have been 

put under the "slight to moderate" heading on Table D because of uncertainty. 

In none of the three abstracts does the summary suggest that the relationship 

is negligible. 



1972-5: DAVID S. HOLMES: "Effect of Grades and Dlsconfirmed Grade Expectancies 
on Stud^t*s Evaluations of Their Instructor ." Journal of Educational 
Psychology , 63, No. 2, 1972. 

In an introductory psychology class of 97 students at the University of Texas 

course grades were based on four objective tests. ^ After completing three 

of the te&ts'.eath student knew exactly what gi^ade he had earned up to that 

time. Student wrote the grade they expected ':tio receive in the course on 

the final test paper, after being promised that the expected grade would in 

no way influence the final grade. When the students returned to collect 

their final exam and learn the final grade, half of those who had both . 

expected and earhed A^s and half of those who had both expected and earned . 

B^s were told that their final grade was one level loWer than they expected. 

The other half was told the truth. They then cfDp>leted teacher rating forms 

before they all got their finals back and learnefl there had been an 

experiment. No difference was found between t\x^ tatings given by A and B 

students, but those whose grade expectancy had qeaci disconfirroed rated the teacher 



18 



significantly lower cm teacher preparation* lecture* coherence* use o£ 
examples* ability to evaluate^ value to the students and test clarity* 



1973-1: BARAK ROSENSHINE; ALAN COHEN AND NOBMA FURSt: '^Correlates of 
Student Preference Ratings/' Journal of College Student 
Personnel s 14* May* 1973. 

The study was of 12^0, dagrtlne classed in all the schools and colleges of 

Temple University in tyring* 1970. The methods of administration and the 

reasons for admlnisteting the scales are not discussed. The authors found 

correlations of .09 to .27 between expected grades and two questions that 

asked &be sttident to cai^^xe the instructor and the course with others. A 

four point rafting scale vaa.^used. They found no correlation between grade 

point averages and ratings. They conclude that the effect of expected grades 

on ratings* though 8tatlat4::cally significant* is low. 

Comment ; The Rosenshine rating form asked the sttidents to rate 

teachers on 23 items measuring classroom style and behavior. Of special 

Interest to English teachers ai;e 'the three items that showecl the lowest 

correlations wlth^student appreciation of the cldss and of the instructor. 

They are:. 

Criticism of papers was helpful to the students •26 
Instructor used assigned papers as an aid to 

^ learning •21 

Instrubtor criticised sttident responses in 

destructive way r»16 

Low correlations were also found for the following varl<|bles:a) Indep^dent 

projects and papers* b) class par ticfipat ions* c) creative thinking* 

d) application and appreciation wexje lioportant for the final grade. 

Much higher correlations were found for the following itemst^ 

Instructor's main emphasis was on student* s learning 
Grading in the course was fair 
Instructor's main enq[>hasis was on having the 

students enjoy the course 
Instructor was enthusiastic \ 
Instructor's presentation was clear and - 

under standk&le 



- .40 * 
.46 

.50 
.57' , 

.62 (the 
highest. of 23) 



ERIC 



20 



19 

The authors comment that though criticism of papers is often cited as being 
important for college teaching, its importance was not borne out by the 
data collected in the study. 



These findings seem to suggest that the following teacher behavior may not ^ 

be conducive to high teacher ratings: 

1) ^psigaing ccjoDogoleac^symbdlic hard- to- explain readings 
* 2) emphasizing learning over student enjoyment 

3) being unfair in grading — grading harder than one's peers 

4) counting papers toward the final grade, particularly if the 
papers require creative thinking and application of knowledge 

5) writing negative criticism on papers 



1973-2: RICHARD R, PERRY AND REEMT R, BAUMANN: "Criteria for the Evaluation 

of College Teaching: Their Reliability and Validity at the University 
of Toledo/' Proceedings, The First Invitational Conference on 
Faculty Effectiveness as Evaluated by Students , ed. Man L. Sockloff, 
Temple University Measurement and Research Center, 1973. ^ 

Perry and Baumann - analyzing 900 students ratings at the University of 

Toledo in Spring 1972 - found correlations of up to .78 between class mean 

expected grades and class mean ratings, with an average of ,42 for all levels 

of the institution. They said of the rating scales "the indictment of the ^ 

validity is very strong; wh^t the correlations reveal is that variations in 

course ratings is accounted for to the extent of 30 to 60% by the grades 

assigned. . . this problem must be resolved in some fashion before one can 

build a reasonable case for validity, 



1973-3: JOHN A, CENTRA AND ROBERT L. LINN: "Student Point of View in Ratings 
of College Instruction," An Educational Testing Service Research 
Bulletin, October 1973, ERIC Document 089581. 

The study was of 300 randomly selected. students from 402 classes in 5 colleges. 

4Jrades were found to be "moderately**relate4 to ratings though not in all classes* 

No specifics were given. The authors say that their findings underscore the 

importance of context of the course In determining ratings. . 

Comment : This 1973 Centra and Linn study does not seem to be aaentioned 

! ' ^ 

in the 1974 ETS SIR (Student Instructional Report) majfiual of interpretation 



21 



7 



20 . 




or In the portfolio of materials ETS distributes to advertise the SIR 
rating scales. 

1973-4:' ROLf^ MIRUS: "Soin6 Implications of Student^ of Teachers-" 

^ Journal of Economic Education ^ 3^NoT 1, 1973. 

Mlrus studied 122 course evaluations (unstalied number of students) at the 

Faculty of Business Administration and Commerce at the University of Alberta 

in 1971-72. He compared mean- section expected grades with mean section 

Instructor ratings, section by section. Fltidlng a correlation of .85, he 

states, "There is a strong indication that the expected grade is a major ^ 

-determinant of the professor's grade. . * A professor who, compared to his 

colleagues, makes the class expect a 1.00 point higher grade can improv?^ 

his own evaluation .85 of a point." Mirus suggests that, the higher coefficient 

or correlation between grades and ratings found in this study as coifipared. 

to the Nichols and Soper is because the career orientation of the Alberta 

students mAkes them more responsive to grades* Mirus asserts that an 



updrift of institutional grades can be 'txpected as a result of the 
evaluation system. A statistically significant higher average gradr'wa'l reported 
in 1972 as compared to 1971. 

197^-5: K.L. GRANZIN AND J.J. PAINTER: "A New Explanation for Students 

Course Evaluation Tendencies." American Educational Resear ch Journals 
- 1 10, No. 2, 1973. 

/Hie authors gave first day of class "expectation" questlonaires and the last 
day of class "rating" questlonaires to 637 students in 17 courses offered 
in 11 different, departments^ at the University of Utah. Among correlations 
found to be significant at the .001 \SsJ^ - of confidence are: 
Course rating to expected grade .21 • 
Course rating t6 final grade .15 

Course rating to, expected grade change - higher rating at the « 

• 2.2 ' ' 



end tha to expected atbeglhnlng .18 
Instructor rating and^ expected grade .16 
Instructor: rating to expected grade change .14 
No significant correlation was found between student grade point av«ages 
and ratings*. Final grades (as contrasted to expected grades) correlated 
4 09 with Instructor rating* 

1973-6: ALLEN J. SCHUH AND MLCHA^ A. CRIVELLI: "Animadversion Erroi In 
t^Student Evaluations of Faculty Teaching Effectiveness." Jovirnal 
of Applied Bsychology . 58, No. 2> 1973. 



1 ■ \ 



A class of 85 students, In k required business administration degr^xzourse 
In Industrial relationships were asked to rate the^r Instructors Immediately 
after he has returned th^r midterm exams. The Instructor left the room-/, 
while the ratings were administered. Ratings were found to be associa^e^ 
with midterm grades beyond the .001 level of significance. 



1974-1: CD. CORNWELL: "Statistical Treatment of Data from Student Teachl|ig 

Evaluation Questionaires." Journal of Chemical Education , 51, ^Qy.^i 19/4 

Sample ; An unstated number of students In 101 different chemistry lecture v 

sectloiis taught by 70 different lecturers In 20 different Institutions. TXie 

data was collected by a committee on Undergraduate' Teaching 6f ^ the American 

Chemical Society. Findings ; Statistically significant but weWrteUtlonshlps 

were found between grades and ratings. The research estimates that the " 

grades accounta^fcr lA^f the variance in ratings. 




1974-2: WILLIAM M. BASSIN: "A Note on the Biases in Students', Evaluations 

of Instructors." The Journal of Experimental Education , 43, No. Ir l97*« 

7 

Mean grade point averages by & teachers at the University of Mary- 

land were compared with the -meaJ ratingis given them by students. Bass in ^ 
found an overall coefficient of 'correlation between grades and ratings 



ERIC 23 



22 

of only . 10. However he found that this minor correlation was associated 
with a inajqr effect on teacher rankings. The average teacher teaching a 
quantitative course, giving a gradepo int average of 2.0, ranked at the 
30th percentile in stu^ent^ratl^^ The average teacher 

teaching a quajidLtive cour8C,J)Ut giving a 2.5 grade point average, ranked 
at the 62nd percentile in student rating of lecture quality. 

Examination of 22 of the 28 studies made since 1953 shows clearly that 
grade-rating correlations do exist and that the associations between - — 
grades and teacher ratings can be qultb powerful. Before reaching a conclusion 
about the relevance of these studies to the Harper English study it would 
pay to look sather carefully at and cooacnt on the 19 studies published 
since 1930 that have led many people to believe thrit^^des and ratings 
are unrelated. 




24 



23 



Studies Showing Negligible Grade-Ratlng Correlations 



OolTjran 3 of Table D shows that the authors of 19 of the 41 studies 
have concluded that the ratings of teachers are not biased by the 
grades the teacher gives. The 13 studies made before the^Anikeef 
stu^ of ^1953 were unaninous dLn 1:akijig this position. Since 1953, 
six of the 28 studies have suppo;rtea the no-bias position. Detailed 
'surnmaiiies a^d ccomaants follow. 



1930; H. H. EEMKEBS: "Two What Extent Do Grades Influence Student 
Ratings of Instructors?". Journal of Educational Research, 
21, 1930. ^ 

Hesramers o£ Piirdue popularize^d , the use of the ratiiv^^ales in colleges. 
Th^-«tudy, first published in a shorter form in X928, made use of 
yil classes. Seven were h^jrfL^chool classes tau^t by practice teache/ts. 
/ Ten were oojllege classes ^ught by four different instructors. Da^aj 
. was collected as follows. When the students conpleted the rating 

the teacher read off tJxe rianies of those students ranking in th^top 
^half of the class ^ asjcing them to p\xt an X on the form. ^Renmers^en 
correlated non-X and X ratings within each class. Some clashes showed 
positive grade-rating- correlations^ others showed negative correlations, 
^jilien he averaged all the c^relations from 17 glasses , he found a mean 
latlon of ^"Sily^.OTO "at the most". He therefore concluded "...for 




^ . the average.4nstructor and the average st:ud6nt there is practi^all^^Q 

.relationship h^een a student's grade and Ms judgement of the instrucrto^ 

as xBCOxdi^df^^B Purdue Scale for Instruction." 

Coianent ; The stu^ is of course a collection of 10 separate 

' ■ " • 1 

"within class" studies of the classes of four college teachers, subjects, 

and methods unstated. Remroers did not coitpare teachew, even thou^ 

rating scales by their very nature dc5 compare teaciiers. It is not to 

be expected that * single wi^^liin'-'class studies of this type^will alwdys 

tell something about the relationship of grades to ratings. They can 

not detect differences in grading style, nor will they in all cases 




ERLC 



25 



show positive correlations between abilities and ratings, even when 

r 

powerful associations between the two exist throughout a de^artrent 

or a college. The low ability and the hi^ ability students of a 

"hic^" grading teacher may be equally h9ppy with him and give ,him -v \_ ^ 

equally hi^ ratings, since all are exposed to the 'same grading style 

and equal numbers from both groxps may be earning hi^er grades thanr 

they are accustomed to making jslsewhere. Likewise low and hi^^abilit/ 

gro-qpings from classes of "low" graders may contain rou^ly equal numbers 

of students who are experiencing more trouble with grades than they ' 

are accustomed to, or that their peers are e^cperiencing in comparable 

* 

classes. The resxilt could be equally low ratings from both gro-cps 
and again a lack of positive correlation, between grades: and rating. 
If Reimners*'liad pooled resxilts across sections then drawn coefficients 
of cprrelaj^ll^ for the entire* group, be wotald perhaps have detected 



the sqjall correlation that has SGmet%ii||{J^en found iit studies in which 

^ ^^^^^ ^ J 

student abilities, as indicated by thei^ grade point averages, have 

^been correlated with teacher ratings. Among studies that have not 

♦ * / 

found the gpa correlation ^are -stuc^f 1972-1 (Bausell and ^^goon) and 
stu^ 1954 tClark and Keller). Among those that have found s^six\ 
positive ability correlationB^ is ^tpdy 1950-2 (Elliott). /See 197^ 
Wiegel) for a small study of 4 teachers, similar to the Kemmers 
study but with- qxiite different results, when A, B, C, D grades were 
, pooled across sections. ^ 

1934: J. A. STARRAK: "Student^Tteiting of Instruction." Journal of 
Hi^er Education , No. 5, 1934 .\ . - 

Starrak reports that 40,000 scales have been taken at Iowa Stiate College 
^since 1928. He gives no , details of the method of collection or size 
of the sample used to reach the conclusion that the correlation between 
grades and ratings is only .15. This correlation he believes is^ small 
'enou^ td be disregarded. ^ * • 




1936-1: J. D. HEILMAN AND D. A»4ENTBDUTt* "TherRatlng of Cq^|j}i©L 
Teachers on Ten Traits by^Thei^r Students." The Journal of 

^ Educational Psydiology , 27, 1936'.; '*; . 

f • ' ' ' 

The authors studied ratincrs taken in 50 classes tatu^t'by 46 different 



\ 



\ 



% 

25 



teachers at the Ooloraao State College of Education in Spring 1935. 

Average cla3s size was 42. Teachers appeirently administered their 

own ratings arf^^j^^lmtarily turned them in. The authors found 

a severity of gp^ding score for each teaciier by averaging all grades ^ ' 

he assigned in the 1935-36 school year. The 'severity of grading 

scores were than oonpared wit^mean sectidh student ratings. The 

correlation found was -.042. The authors therefore conclude that/ ' ' 

there was no relation between student ^ades aaid teacher ratings. They 

ccDiment at;, same jlength^ however f on the difficulties individual instruc- 

tors had in interpreting the meaning of the Scales. Average section 

standard deviations were very high. One instructoiTf for exanple, 

was found to have a standard deviation of 27.30 on the 100 point scale ^ 

suggesting sudi a wide scatter of student opinion as to deny the 

existence bf a center. \ 

^ Oomment ; The Heilman and Annentrout-stut^ is welt "designedtb 

detect the influence of grades on ratings. It ^s an admirstoly detailed 

study. Thou^ the teachers took their own r^^igS, .soiaewhat weakening 

its b'elievability y it 3eems 1:o this reviewer to be the only stu^ in 

the lite^rature whidi truly support:^ the conclusion that the grades a 

-'t€ra3ier'give^ the ratings he receives can be unrels^ted. One may> 

hcwevexf wonder about the size of the standard deviations, found.. Their 

size may indicate that something was wroncf j?ltfa. the scale or 

adminis t rat ion . 





1936-2: ^qilflOON L., BLUM:'^ '^An Investigation of the 
^^---Betv/een Students' Grades a&id Th^ir^^ Rating 
" " Ability Ta Tea^h." Journal of tEducattofial ^^dii 



This is a study of 57 students in two 8-week sunmer psychology closes 
taxjght by the^^tsmie taadxer at City College of New York.^ EIot. found 
relationship between expected or""fincil grades and insl^j^^or r^fc^ing. 
Forty of^the 57 students were ea^e^rting A's ana:~B*3^^t ^t^J^me^o^ 




\ , . \ ■ ' . . 26 • 

■ - ■ \: . 

rating. Sixt^n were e35>ectlng C's^ on« a D. 

CQ ig nent>; Lacking an cacperimental design such as those found 

in the single teadier "within*' class studies of. Sdiuh and Crivelli 
(1973-6) and Holmes (1972-51 , this stu^ merely adds two more with- 
in class studies to the 10 found in Remmers C19301. They show 
^at a teadier need no^ eiwayp e3?)ect to find positive grade rating 
oorrela:tions within his own classes. The stuct^ is^ of no value in 
telling the teadier hew the grades other teacJiers give affect his 
ratings. t ^ - " • 

-. ■ , ■ \ ■ . , ■ ■ . 

1949: H. H. KEMMEBS, F. D. MARTINf mD D. N. ELLIOTT:. ^Are Student * 
Ratings of * Instructors Belated to TheJLr Grades?" 
Fc^rdue Studies in Higher Education , 66, 1949. 

The .stui^y evaluated 37 graduate assistants teachifig-the lab and 
recitation sections' of the freshman chemistry course at Ptrrdue. The 
senior professor who gave' the lecture - demonstration was not evaluated* 
The graduate assistants had little to ^s^g^-^'^out cou^e grades, exane 

g standardized and depaartanenteaiy graded. The researciiers divided 
the students into two groups:' the plus groxp consisted of those whose ' 
final *grades were higher than pre-course placement tests predicted; 
the minus groxp received graded lower than predicted.' The plu£J groiS>s 
were found to rate their instruqtors significantly hi^er (.13 to .35) - 
than the minus grotgp. Since the resear^ers found no relationship between 
placement test scores and ratings, and since the assistants did not 
OontTOl the grades , they .conclude that the connection found is not higher 
^ades cause better rating? but that better teaching causes higher 
ratings. The authors also gave -their attention to the phenomena that 
RewDtters had first noticed 20 years before, the fact that some of the 
closes in his within- class study of 1928-30 showed a negative correlation 
between grades and ratings, other showing^positive correlations. aJhey 



now suggested that som teachers are good at teaching hi^ ability students 

and poor at teaching lo;^ ability students. Th,ese teachers^ they reason ^ 

will receive poorer ratings £rom the weaker students and thus shew positive 

correlations between grades and ratings. On the other hand/ teacher^ who ' 

are best at teaching low ability students will alienate some high ability 

students and' sh6w negative correlations between grades and ratings . 

Qoinment t The Itemmers ecjqplanaticn of his findings of 20 years 

eairlier may suggest the interesting possibility that the best method of 

achieving consistently high mean student ratings would be to teacii.to the 

abler students and to see to it that the less able were not disaffected ^ 

that, is make the weaker students feel successful. See Holaaes (1972-5) for 

a possible e35>Ianation of the action of the minus grotp* Also see Bauseil^ 

and Magoon Cl972-li. . , 

1950-1: DONAtji ELLIorET: "Characrt:eristics and Itelationships of Various 
Criteria of CJollege and University TeadEiing." Purdue TJrfiversity 
Studies in Hi^r Education ^ 70, 1950. ^^..^^^ 



Donald Elliott was Iteinmer's assistant in the Division of Educational Reference 

at Purdue. His first stu^ seems to be a continxiation in greater detail of 

Pemmers, Martin and Elliott ilsSf^. Freshman cfiemistry assistants were , 

.again involved. Only 9% of the"\^sistants had previous teaching e^jperienca.. 

Most did not plan to become teachers. The seaiior lecturer vas not evaluated. 

The assistants had little to say about the grades. Tests*' were departmentally ^ 

desigaed and eyaJ-uated. Elliott found correlations pf grades to ratings of 

only ♦032 for lab sections and .049 for recitation sections. He did. however 

find a correlation of .24 between ratings and achievemftn.t , that is he found 

that students vrtio got better grades than their pre-oourse tests indicated 

they woxild tended to rate their teachers higher than students who did not ^ ^ 

achieve as much. Elliott atLso foimd a negative correlation between student 

achicvement/^d teacher knowledge, of chemistay. The students who achieved 

most Cas measured by grades higher than pjredicted by placement tests! tended 

to be most often in classes tau^t by teachers ?who scored lowest on a test ^ 

of knowledge of chemistry. 

Comment : Thls'finetL finding of Elliott la fascinating. He who knows 

least teaches best, and gets the hic^est student ratings. Its significance 
to the Haiper English Department study is unknown / but one may speculate. 
The problem Elliott faced in this stuct^ is the one faced by Rammers , Martin, 
Elliott (1949) and by .all the many researchers who have tried to prove that 



29 



28 



student ratings of faculty are related to what the ^student leama. The 
Problem is that one cw never be certain that the student >*ho scx>red 
hicfier on an exam or received a final grade higher than his GPA or gptitude 
tests indicated he should is rating his teacher higgler becaosse he has 
Teamed more than he expected or because his grade is hic^er than he 
expected it to be. Again, refer to the Bausell ^d Magoon stu^ (1972-1) 
for an eaqplanation of the discrepant grade effect. 

1950-2: DONALD N. ELLIOTT: CThe Second Stud^/I^und in "Qiaracteristics 
and Relationships of Various Criteria of College and Uaiversity 
TeacMng," Above.) 

This study was the second of t;wo mdertaken >y Ei:|.iott as material for his 
doctoral dissertation. According to Elliott 26,014 ratings of 460 instructors 
had been collected from 14 Indiana colleges and universities as part of the 
Indiana College E5:aluation Program. He mentions the nuni>ers exactly i^.* How- 
ever, he says only those ratings taken at Purdue contained information about 
grades. At Purdue, the instructors, following Rammers plan of 1928-30, 
asked their tpper-half students to put an X on the forms • A total of 3786 
ratings CL906 xpper andr 1880 lower) were then available for ooitparison. The 
ratings were^^iotfl^fedj^ not treated as within - class ratings as in 1928-30. 

In all categories except that of the graduate student the upper half students . 
rated the Instructor hic^ier than the lower half of class. Sample mean scores 
frcm the scale that were found" significant at the .01 level of c^fidence 

^.^^^ ' 

^ - ^ ^ — Upper hotfex 

Paimess in grading 89.15 82; 40 

Presentation of sx4)joct 

matter 75.80 73.35 

Simialating intellectual 

curiosity 75.05 72.65 . 

The Icwer^half students gave slightly lower ratings' for every other item on 
the 10 point scale. Elliott concludes "....the factor of scholastic success 
has such a sTic^t effect, albeit the effect is statistically significant, as 

to be virtually ignorable, particularly when it is recalled that most classes 
cure made vp of students^^f widely varying sdiolastic success. V 



29 



Qjnnaent ; The Manual* of Instruction for the Ptardue Rating Scale % 
for Instruction by H, H, Renitters and J. A, Welsbrodt (Revised edition, - 
1965 ) copyright by the Purdue Research Foundation, contadns the following 
paifagrsph eis its toted oontrlbution to the grade-rating controversy ^ 
"Several questions have been raised reg2u:ding other factors 
that ml^t affect the student ratings of instructors, Rennners 
and Elliott (161 hecve answered loany of these questions. In 
a stu^ of the ratings of 460 instructors by 26^014 raters 
in 10 different institutions of hig^r -learning they found 
that freshmen rated their instznictoi^ no hig[ier and no lower ^ * 
than did seniors ^ male students rated their instructors no 
differently from femsuLe students, veteran students rated 
their instructors simii^ly to non-veteran students^ and 
students in the tgpper half of the class rated their in- 
structors like those in the lower half. None of these 
.factors had any effect on the ratings by the students," 
Someone is mistaken^ either Elliott or the editors of the Manual / .Jt is 
unlikely that two separate studies would start with exactly 26^014 ratings, 

1951; E2^ HUDELSON. "The Validity of Student Rating of Instructors" 
School and Society / 73, 1951. . 

This is a one teacher stu^ with a difference, Hudelson asked his 192 students 

to rank their former teachers anonymously. He then asked them to give the 

grades they had received from the teachers. Finding a correlation ^o£ only 

,19 between grades and ratings^ he concludes, "Obviously these students 

could not fairly be charged with letting marks influence their opinions of 

their Instructors as teachers , " ^ 

Qjmment; The system of collecting data/ somewhat siinllar to that 

. later us^ by Voeks & French CL952-I,^ 1952-2 1^ could lower the positive" 

correlation since it removes from the sanple tt 

school because of the lew grades they received. 




searcher found in^Jiie literature who provides a scatterrdiagram to a?3^ustrate 
the association between grades and ratings. Though he did not give 
teacher ratings for eadi grade level/ it is instructive to the reader to 
examine the scatterrdiagram close^ and to do his ^wn^cirithmetlc. If he does 
SO/, he will discover that the weak\^9' correlation was produced by the 
fbllowing data? 



\ 



ERIC r 31 



30 



The 38 A students gave mean ratings of 6, 8, 

The 87 B stiiaents getve mean ratings of 6, 5, 

The 57 C students gacve loean* ratings^ of 5, 6, 

The 8 D students ,gav8 mean ratings of 5.6. 
All ratings were on a 10 point sccLLe. The teacher Who gave an average 
grade of C to his classes mi^t thus esxpect to produce leverage student 
ratings about 9/lOths of a decile below those who gave an average grade 
of B. In the 1974-75 Haiper English stu^ the average mean final, grades 
of- tke low graders was 2.33, of the hi^ 2.80, a difference of a half 
grade level. The average', xoean ratings of the high graders was 5.22 on 
the 6 point scale and of the Ipw 4.78, a difference of appr o xi m ately 
8/lOth of a decile. ,The^eui>er grade-ratings relatiaiship are therefore 
seen to be ^proximately twice as strong as those found by Kudelson in 
1951 — results not inconsistent considering Hudelson's method of col- 
lecting data and the merit system at Harper. It is obvioias, in spite of 
amdelson's conclusion, that the 1951 stu^ does not show a n^Jgligible 
associatioit- betWQen^grades and ratings, but rather shows the opposite. 
It is possible that star^rak (i934) with his .15 correlation "small enough 
to be. disregarded" also belongs in ax\6ther column in Table D. 

1952-1, 1952-2; 1952-3:' ^PCJINXA W. .VOEi^ ^3L GRACE raBNCH.* "Are • 
Student-Ratings of Iteachers Affected by Grades." ' Journal of 
Hicfier Education , 31j;^60. ; 

These three studies, whidx were specifically focused on the grade-rSiting 
relationship,^ weare part' bV a series on a nunber of aspects of teadier 
evalxiation undertaken under the direction of E. R. Guthrie at the University 
Qf Washington. The research was done in 1952, but publication was delayed 
until 1960. . . 

Data for the first two studies was cpllected at spring registration. Students 
of advanced sophomore or higjxer rating were asked to nominate teachers who 
fitted the five categories of the Washington teacher rating scale; very 
superior, si5)erior, con?petent, only fair, of less ^ue to xpe^h^ the 
others. . . 

The researdieia then ccaiputea 5«an rfttip^ Sox tho^e te^cheArs nQ9i.n^ted 2(1 
ox UP^fe tiH»3. They ^0 collected the grades thea* te^che^ h^d ^9i;0ned 



32 



^ • 31 

diiring the preceding two term* \'/ 

Stu<^ 1952-1 : In the first study the researcher-^'^rew rank order 
correlations in three departnaents between st\ident ra^jigs and 1) percentage 
of A's and B's and 2) percentage of D's and F's the -^adier gave in the 
preceding two terms. They ireport:"* . • • all the cd^relations between grades 
and student rratings -were negligible (see Table I) • Nj> correlations ^f^^ 
reliably greater than zero , even at the 5 per cent ootitidence level. " , 
The essentica part of Table I are», reproduced below, <t 

Correlation of 'Vhe Ratings Assigned 
"No. of bj? Students andf the Percentage of 
Department faculty Each, (^ade Given 

A S B fa ^ D & F 

A (Physical Science) 10 . .00 -.^k +.04 

B (Physical Science) 11 +.60 -.$7 -.05 

C (Humanities) 13 +.05 -.Zl +.36 

Cbimoent: It is difficult to understand why the. author chose to 
display the data in the above way.. The mean grade point average of each 
teacher should have been available. Oorrelations drawn between mean ratings 
and mean teacher grades would have been much more useful to the reader* The 
strong correlation between A's and B's and the ratings in Department B suggests 
that a rank order correlation based on mean grades could approach the cor- 
>,^^ relation levels found in the Heirper English studies. In l^e other departments ^ 
the negative' cor relat^dns under the C';S suggtest that*-a^ank order co- 
efficient based on mean grades might produce oorrelations xn the range of 
.15 to .30. The correlation of .36 under the D & F colxumi in Depairtmenl^ C 
is of little significance on a stu<^>^that eliminated many^D students and^ 
lower level C's ttora the sample by taking ratings only from those 
survi^d to at least advanced sc^hpmore status. The statement "No correla- , 
tion was reliably greater than zero^ even at the 5 per cent confidence level" 
has little meaning when a study is restricted to 10 to 1^ teadiers. Co- 
efficients of correlation have to be in the range of .55 to .65 before 
significance can be claimed with such limited numbers. 

Stu<^ 1952-2 ; In the second study the authors canpared the hi^est 
and lowest rated teachers in each of 10 large department's. They provide 
Table II to show results. The essential parts of the t^le follow: 



ERIC 



33 



32 



No. of Mean Grade No. of Mean Grade 

, Department Students Hi^est Rated Students Lowest Bated 

Jb:diitect\are 81- 3.185 151 ^ 2.338 . 

Art 126 3*206 90. 2.400 

Chemist zy 79 2.633 239 * 2.155 

Economics 365 2.123 68 2.838 

Education 134 2.888 \- 241 . i.dZi 

English- ■ 165 3. 062 62 2 . 145 

Math 100 2.350 45 2.155, 

Political Sci. 26^ 2.588 : 62 2.564 

Psydiology 97 2.588 , 439 -2.414^ 

Sociology 237 2.477 81 2.222 

The researd\ers conment; 

AS Table II shows, the teacher with the, hig^iest student-rating ^ 

in his department usually had gi-ven a slightly higher average 

grade than the teacher with the lowest rating. . .These differehc§fc_ 

are -very slight and often not statistically signi^^cant^^^j^^ftfialysis 

shows no reliable difference between the mean gradMx^ven by 

the ten teadiers with high student-ratings and tile mean grades 

given by the ten teachers with low student^siaitings. . .in the 

relatively rare instances in whidi a teadier with hig^i ratings 

edso gave appreciably more hi^ grades^ it is evident that he 

- did not receive .l^^ier grades because he gave more* than the 

average nunber of low grades. ' , - 

Qpmment : The ooit5>arisons in^the above table a^ j^Aaps unfair, 

large-section le^cturers, who may not be personally involved in grading, being 
oompaied with seminar teadiers. Hoi^ever, the table does show that in 9 out 
10 departments the hig^iest rating went to the man with the higher grades . 
In fbur Apartments the diffeterTce is quite large. The English Departmeiit 
difference, 90% of a grade level^is close to the difference between the 
higjiest and lowest rated teacher in the Harper English stu^. It is dif- 
ficult to see how the authors could make their generalizations on the basis 
of the data they display, particularly since the method of collecting ratings 
votild serve to eliminate disaffected low-graded students. 

Study 1952-3 : In the third atudsf the researchers found 16 teachers 
who had given the Washington course evaluation questionadre to different 
sections of the same course, the ratings being taken at least ontf*cquarter 
apaurt, aknd who had scored at least three deciles hi^er on the second adminis- 
tration than the first. They then conspared the grades the teadier had given 

Er|c 34 . 



33 



tl|e first time with the grades they had given when they scored the 

hic^er rating to see if the grades had gone xp with the ratings. 

They siipply a conplex table that shows whether the difference in 

grades could be eaqplained by chance. They jfound that one teacher had 

gi^^n appreciably Icwer grades to the class that gave him the hi?^er 

rating. Or\ page 333 they report that two teachers had given higher 

grades approaching statistical significance C.07 and .Oil to the"^ 

second class • On page ^34 they report that only one teacher had 

given appreciably hic^er grades the second time. They conclxidej 

-^usually the grades given to the twc/ classes' were strikingly similar. 

* .apparently high, ratings jcannot be - 'bou^t' • . . " 

CaumPint ; It is ^fortunate that the ^ikt&ors did not ta]ce the 

very sinple step of placinp opposite eadx ofiher the section grade 

point averages given by jlc^-rating and hig^-rating classes. Instead 

they elected to ^ve only ihe dii scpiares of diffsrenc^ in grades in 

the two classes. This i& cif course not very helpful to the reader 

since it deprives him of 1 the opportxmit^ of seeing whether the majority 

of the hi^iftr rated dassesl got somewhat hi^er grades,' and it also 

does not allow him to see if tiie highest ratings, those in the 8th, 

9th, and 10th deciles, weqre accoinpanied-i)y higji se,ction grades. 

If eithenof these situations existed," one mig^t be Weinipted to take 
the Voeks Wnd Frendi studies and put them under the oDlimp m Table 1) 
that Shows \at least moderate associations betVTiSen grades and\ ratings. 

Professor EKLe often points \i> the University x)f f?ashlngt:on ^ the 
model of goo* evaluation- practices. In ^'What/^e We Afraid^ Of?" fie 
criticizes tK» "abuse of reseaxdi" shown ldriani-ax),d burton Hodin in 
their article Un Science" that suggested students rate higfiest those 
teadiers from mam they learn the least. Professor Sble i:u3e^^the 
following term3\ 

. • V • - . ■ /- 

The authors (Podins) ommission of relevant research is 
curious. Though Virginia Voeks article "Piabl^^ations anc 
Teaching Effectivenessj* is cited, a more reL^wn€^ article\ by 



34 



. Voeks and G.M, French," Are Student-Ratings of Teachers ' 
Affected by Grades," with ccsiclxeions again the opposite 
of the authors • , is not» Peihops this is because Voeks ' . ^ 
work is based on careful study of data amassed at the 
University of Washington, where alxaost 50 years of , 
eacperience with student evaluations supports the conclusion 
that student eyalxiations do correlate with teaching effective- 
ness . 

It seeios to this reviewer that the data amassed at the University of 

Washington may have been , somewhat distorted by the student rating scale «~ 
used to amass it. It is the one encountered in the study of the large 
literature cjf^^student evaluation that is inost curiously lacking in 
parallel structure. The first four rating categories; 1) vtfry superior,* 
21 superior, 31 oonpetent, 4) only feiix^ are standard enou^, but 
the fifth category "of less value to me that the others" suddenly in- 
vites the student to switcix from an evaluation of the instructor to an . 
eyalu^^jon of the course. Even thou^ Professor Eble went out of his 
way in his ASUP^AAO-Camegie so^^ported stu^. The Recognition and Evalxia- 
tion of Teadiing^ to praise it^ a model for other colleges to copy, it 
is difficult to see how valid rating data could be collected from it. 

The Voeks and French studies have been of najor iirportance to the litera- 
ture of student evaluation of faculty. , They are quoted in ajmost all 
the inportant review literature' of the past dozen years . /TheLx publica- 
ti<»i in 1960 negated the effects of the Anikeef C1953-3) si^d Weaver 
(1960) studies that had shown positive grade rating correlations. In — 
the opinion of this reviewer the Voeks and Prendi studies* had the 
following faults: ^ 

1. The instrument used to collect the data was questionable. - 

2. The method of collecting thb ratings invited bias. 

3. The data collected was not displayed in the most natural way. 

4. The conclusions readied did not follow naturally from the 
data that was displayed. 

1953-1 1 A. W. BENDIG. "The Pelaticwn of Level of -Course Achievement 
to Students ^ I^st^juotor mfi ' Cou^nie ^lAfefn^ in SntxodCictory 
Psydiology" - Educational and Psychologicatl Measureiftent , 13, 1953. 

Bendig studied 5 introductory psychology courses C132 students total) 
at the tibiajersity of Pittsburg in Spring 1951. He found positive 



36' 



35 

correiatioiis of .14 to .28 between grades*- and ratings* concl-udes 

• • »^ 

"Student achievement does eiffect tlie rating, but not to a degree that 

invalidates continued lase of the s ca3.es 

1953-2: A. W. BENDIG. "Student Achievement in Introductory Psychology 
And Stu^t Rating of tlie Oonpetende and Enpathy of Their" 
Instructors." Journal of PsydiOlogy , 36, 1953-* 

r 

In fall 1951, Bendig again studieji-5 sBCti^ of ixvErOdudtoiy psydiology 

C121 students i. Grades^ tirere apparently based- entirely on .obji^ctive* 

adiievement tests. He found strong negative oorrelat3,ons Cfigure not 

given) between grades and rating. He cancels his spring "findings as 

follows "....the previously reported strong positive correlation between 

"Stiodent achievement and summed ratings on the P.R.S.T. scales was a 

function of the factorial coitplexit^f the scales." Bendig gives ^ 

possible e3i>lanation for the negative correlation, "Students of hig^ 

overall ability may be more aware of inadequacies in the /teaching of 

their instructors and to judge them mo are critically." 

Oomroent : The presence of one unpopular higji grader in a sanple 

of five or a highly structured ooutse, earning, the oontenpt of abler 
students, could both produce the results. Bendig' s high negative correla- 
tion and Heilman and Armentrout*s -.042 arfe the only' studies of the 
41 to show negative correlations between grades and ratings. If no 
correlation existed between grades and ratings, approximately 20 studies 
could boi eapected to show negative results. 

1954^ KEIJNETH E. CIiARK MD R. J* KELLER: "Student Rating of College 

Teaching." in R. E. Edcert and R. J. Keller (eds.) A University . 
> Looks at Its Program. University of -Minnesota Press. 1954. 

A total of 15,000 ratings by students in 380 classes in the University 
of Minnesota College of Science Literature and the Arts were collected 
in a voluntary program in 1949. Though the authors supply no specific 
data, they report they fotnd little relationship between the students 
overall grade point average as he reported it on the rating form and 
the teacher ratings. In fact, students wilph grade point averages below 
"C" tended to rate teachers somewhat higher in general teaching ability 
than other students'. Only a few items sucdx as q\iality of exams , ability 
of teachers to adjust to the lev^of the students and willingness to 
recommend the oouri^ to a friend were found to be related to ratings. 





.36 . 

^ CXnnment ; The interesting tendency of truly marginea D and P 

students to rate their teachers higher than C students has been observed 
in several other studies. The Cleu* and Keller stut^y, concerns itself 
only with the grade point average that the student brings to cla3S« 



is not a study of grades earned or ea?>ected withih a cla3S* 
point average and placement test studies tend to agareet 
student ability is only marginally- related to teaciier icings, ,/ta.ark. 
and Keller's study, of course, says nothing about the relat^axship between 
ratings and the grades ,the students were ejqpecting from the instructor 
they were rating/ 



ia62: M» GARVERICK: WU? H. CASTER; "Instructor Ratings and / 

Eacpected Gride3»" California Journal of Educational Rese^rcn , 
13, 1962. 

Sainple ; 164 students of' on^ instanictor In an introductory psychology 

course at Berkeley l^n two. semesters . Findings g The grades the student 

expected and the grades the student thouc^t he deserved had little relation- 

ship to teacher rating. The oorrelatiai wa3 only .079. 

Ooimoent ; A one teadier stu^ of this type proves little. i(fee 

the ccmments under 1936-2 (Blum) ♦ 



1966-1; C. L. OVEimjRF AND E. C. PRICE: "Student Rating of Faculty at 
St. John*s River Junior College With Mdendum for Albany 
Junior College." 1966. ERIC Document EDO 13066. 

A total of 10,000 ratings were taken college wide in 1964-65. The ratings 
weaire conpulsoxyV the Dean of the college and the instructor waiting out-^ 
side the door while the students ooicpleted the forms . OJhe results were 
apparently xased fox, merit pay and other personnel purposes. Teachers 
were ranked 1 to 91 according to their evaluations. Althou^ the hi?^est 
ranking instructor gave 72% A'3 and B's and the lowest 7.2% A*s and B's, 
the authors report that when they ranked the 91 pairs (mean gpa given by 
teachers and mean ratings given by students) and applied Speairman's 
rank order equation to the two lists, they found a correlation of only 
.17, significant at the 10% level btxt nbt at> the .05. Following the 
comniDn statistical custom of not finding an association unless there is 
95% certainty that the results could not have ccme about by accident, 



Er|c 38 



/ 



37 



they state, "The statistical evidence does not sv^pport the conclusion 
that instructors awarding higher marks should eaqpect a hic^er rating 

from his students • " 

Oongaent ; The statistical evidence from St, John's River is 

unconvincing, Anikeef and several otherC have used Spearman's formula 
to find correlations between grades and ratings, correlatioxi^/^wjiich 
incidentally turned out to be much hic^er than'lJverturf and Price fbund, 
but they worked with lesser nunibers. The Speerai^ formula is believed 
to be accurate enough for most purposes when 15 to 30 pairs are being 
correlated, but it does not seem reasonable to expect it to handle 91 
pairs. Overturf and Price were sxirely working with very 'large squares 
of difference in rank and with a libnber of ties needing ooi^rection, 
A, C. Crocker in Statistics for the Teacher , 19^, says on page 58, "A 
sinple method of calculating a correlation is the Spearman rank order 
correlation. This is usefirL for classes of children (or any set of scores) 
tp to a maximian of thirty scores in* each set. Beyond thirty the results 
•tend to be unreliable." 

^1069-2: BETNARD CAFFKEY: "Lack of Bias in Stxident Evaluation of Teadiers. 

Proceedings of ^he 77th Annual Oonventiony American Psychological 
Association , 1969, Vol. 4, ' 

Caffrey studied 131 students in three sections taught by three different 
'instructors at Clemson university. The subject matter taught or methods 
of grading aure not discussed* He found that only 6 of the 46 items on 
the rating scale cojcrelated beyond the. .01 level of significance with 
--course expected grades and only two correlated at that level with grade 
point average.^ The six positive correlations with eacpeqted grade ranged 
from ,32 for the students overall rating of the course to .23 for the 
instructors ability to e3«pLain clearly. The axithor judged the effect of 
grades on ratings to be of little iitpottance. 

4 

I \ 

1971-3; MIUTON HILDEBfe^ND, HDBEBT C. WILSON AND EVELYN R. DIENST: ^ 
Evaluating, University teaching . Center of Research and 
Development in Higher Educations University of California/ 

.. ^'•'^ / • 

The authors undertook a stu(^^ at the University of California at Davis 
designed to develop a rating system. As part of the study they took 1015 
student ratings. The method of collecting data is unclear and no specific 



'39 



data is listed. ♦They do however state that they found small positive 

correlation^ with grades whidi were significant at ju3t beyond the .01 

level. Their findings they believe are consistent with previous research:' 

for th^y ccraaent: "Cohen and Brawer C1969) reported siiailar resialts^ 

Other studifes have reported a relationship between e3?)ected . grades and 

ratings of teachers CStewart and Malpass , 1966; Weavet, 1960), a relationship 

♦ 

only at lower class levels (Anikeef 19531, and no relationship (Kentr 

1967, Voeks and French, 1960) . These oontradicticms ^eem consistent with 

the presence of a definite fept trifling correlation.^ .^-^^ • 

Comroent : Hildebrand and his associates seem to b^ mistaken on nearly 

all comts vAien tl>ey check the believability of tbel^r own findings by 

reference to past reiJpearch. The origined. research stnidy in Cohen and 

Brawer C1969) seems to say nothing about grades. Instead Cohen and 

Brawer refer to the doxibtfi^ Overturf and Price study (1966-1 above) . 

A carefyl reading of Anikeef C1953-3) shows, he did find associations between. 

grades and ratings at the -upper class level — ,43 to be exact. Kent is 

a secondary soxirce. Voeks and French (1952-60 X. did of course report no " 

correlation. 

» * 

1972-3: ALLEN c/ KELLEX: "Uses and Abxrses of Cbxirse Evaluations as / 
Measures of Educational Output." Journal of Econoioic Education , 
4, No. 1, 1972 

^ • /' 4 

Sample; 258 students in two lect\rtre sections in economics at th6 Universi,ty 

of^ Wisconsin, lladison. Both sections were taught by the same professor. 
He w£is aided by 7 graduate eussistants who met discussions^ectloris once a 
week. The ratings wefipe taken aftdr the. first midterm exam and before the . 
second. Controls : Though ratings could not be anonymous , t}yt students 
were assured that their identities would not be revealed* The senior, 
professor left th^ room when ratings were taken. Controls not discussed 
Eire: first, the nature. of the midterin exam, whether it was objective or 
essay and.whethe?: graded by computer, assistants^ or senior professors: 
and second, whether or riot the students belieyed the results would be 
used for personnel ptirposes. Findings : By constructing two simulated 
statistical jBDdels, projecting what would have happened if conditions in 
le^ourse had been different, *Kelley demonstrates that if students had 

iTceived qnl^- A's and B's for the itddterm exam the actual rating of the 

> \ > 

senior |>rofessor would have risen from 3.784 to only 3.860. Thus he finds . 



that the inpacb of iAcreased ejected gx^adea, thouc^ statlstixJally 
sigfnif leant y was very minor, ffhe teaching assistants as a gro\5> were' 
found to produce a negatiye effect on ther senior professors ratijig. 
The negati-ve efffect was causefl Itrgeiy by TA#5- If his students had 
been enrolled in the classes jbf TA#2 and TA#4, the senior professor's 

ratings, according to Kelleyf would have been .28- higher. 
Co rmne nt ; No^atment. 



\ 



41 . 



\ 



r; 



40^ 

The Relevauice-of Past Research to the Hetiper English Findings 

The evidenkce indicates that, the widely-held belief that grades and ratings 
are imrelated is a nyth, Piirther, it indicates that the iiiyth seems to 
have been spsread by those who have a vested interest 'in promDting it, The^ * 
body of fejipirical reae^di that su2>posedly -underlies" the no-relationship 
genercLlization tinms ottt to be without real^ substance when one makes an 
effort to look at all the evidence, not jiist at-selected .studies or parts 
of Selected studies.* If a convih>^g bo^^ of evidence exists to s-upport 
the generalization it has evidently not been piablished. 

The "classic research" as McKechie called it, of H. H, Be im aers and his 
students at Purdue turx\s out to be: CD axi- examination of four -college ^ 
instructors Osubjects and methods of (jading unstatedl and sei/en hi^ school 
practice teachers f i2)^31 graduate assistants in a chemistry course where 
the senior professor was n6t-€S^'uated,_where the assistants ^or the most 
part did not plan to became teachers and had little to Bay in assigning 
grades; C3) a study of an unstated number of Pitcchie instructors teaching 
unstated subjects, the study design guaranteeing that differences in studejit 
reaction to hafd and easy teachers would be concealed; C4) an instiructicn 
manual reporting tlie findings of the research and in doing so turning on^ 
college into 10, an unstated number of instructors into 460, 3786 students 
into 26, .014, and "virtually ignorable" differences into totally ignored 
diJEfezen'ces . ' • . , 



\ 



id 

ERIC 



The "careful stu^*of data," as Professor Eble describes it, at the Uaiversity 
of Washington, which nexj:. to the Purdue studies did the most to promote the 
no-relationship generalization, is revealed upon exaottdnation to be somewhat 
leas careful than one mi^t,wlsh. And the same is true of the project at 
St. John's Idver*Opinmunity College, project much publicized among two year 
■ colleges^ where the dean and the teacher stood together outside the classroom 
?oor, waiting for -the ratings that wotild rank the teachers in ordei^ througji 

Of the 11 remaining no-bias studies, three are one-teadier in-class 
projects; one is^a three teacher stu^ finding correlations of xap to .32; 
.two are conflicting five teacher studies; and one is a grade point . average 
.^tudy tClark & Keller) . One of the remaining studies (Starrack, 1934) reports 
k positive ^correlation of .15, but give no det^ails as to how the figure was 

42 



reaciied. fljaother (gtadelson, 195li seaiBS to prove that gradaa are quite 
Inportant vrtien a oorrelation of .19 exists. 

The case for the no-bias position rests^lsurgely on tvo studies, conducted 
35 years apart. These ar^^^e-geilTnan ana^TTPftntrout st\a^ of 1936 (cor- 
relation -.0421 and Hildebrand, 1971 toorrelation about .091, and the 
fo^ioer 'is tainted by the size of the standard deviations and the latter 
by^the lack of spec ific det ails about how the grade-rating correlations were 
drawn^ the type of classes tised as sajq?leSf and the llke» In any event one 
inust -view them in conjunction with a large nunber of studies r inany quite 
persuasiver that .show otheacwise. If we were to draw a frequency curve of 
all the ae pxblished grade-rating studies^ made since 195S, including all 
six studies in CX>lXBifi 3, translating all findings into correlation coefficients 
the range Would Iron from just under +.10 to +.90 with, a fairly even distri- 
bution between. There seems to be a tendency fbr the correlations to be 
higher when the ratings are conpulsory and tied in with a ineacit pay or 
faculty pronDtion system. They also seemed to be hi^ier whan' grading is 
subjective and when the teadiers being rated are teaching multi-sectioned 
courses. r'"' 

The iarper English stu^ findings ot^JSr-lA and^ 1974-75 therefore do not sem 
to be atypical.— -J^hez^^^y seem to fall easily into the patterns established 
by prior researdi. There is little doii>t that a strong relationship between 
grades and rating exists in the English Department at Haiper. It is dDubtful 
that as hi^ a correlation exists in other departments and divisions of the 
colleger but it would be most surprising considering the history of the 
a;esearch to find any large "fcranster *cotirse subject area where it did not 
exist in some form. 

It is customary anohg some statisticians to eissert that correlation of less 
than a . 2S as -virtually meaningless and those of under .50 as indicating 
something of no gr^aJt. importance even when constant replication of results 
indicate that association exists beyond reasonable doubt. But when people 
are being ranked on a scale the asseiction would seem to be open to question. 
Other things being equal the one with only a sll^t advantage will be ranked 
ahead* Hudelson's correlation of +.19 and Bassin*s a974-2) correlation 
of t.lO with' their cxjrresponding shifts in percentile reuiks ill^trate this. 



42 



When a limited number of proEootions are being ooiopeted for, even small 
oorrelations\becony|jpieaningful. The spread between the 4th and 10th 
teacher rank otecdles in most rating scales usually does not exceed one 
half of a rating levels. Harper *s is no"^exception. 

The Harper English stu<^ shows that good ratings and moderate grades are 
not incaiEpatible, It also shows that giving h±^ grades does not of 
itself guarantee hi^ ratings, but it does show, beyond 5o'u3>t, that on 
journeys to the hi^ country - the 8th, 9th and^lOth deciles \idiere 
the^"outstanding" English teachers are - high grades seem to be essential, 
A recent Harper College cx>mmittee report. suggests that the term "out- 
standing" be reserved for those teachers who had scored 5, 50 or above on 
the 6 point Harper rating scales. Only four English sections in the 
fall 1974-75 term reached that level. The average expected grade In 
the four courses was 3.27, the average final grade 3.07. In expected 
grade meeins these settions rank 1st, 7th, 8th and 11th among the 35 in 
in th^^^&tudy. In final grade means they rank 1st, 2nd, 4th and 8 th. 



th^^^^ 
2 gradii 

and 6th among the 16 teachers in the de^ctrtxaent. 



The grading style indexes of the four teadiers ranked 1st, 2nd, 3rd 



At the bottom of Table B are the four lowest rated sections in the depart- 
loent. They are ip first decile college wide as well. These sections 
ranked 20th, 26th, 34th and 35th in ejected grade, 19th, 29th, 32nd 
and 35th in final grade. Their teachers ranked 12th 13th, 15th and 16th 
in their grading style indexes. 

These ranJcings are quite consistent with the findings of numerous empirical 
studies duiring the past 20 years. 



/ 



ERLC 44 



43 

High Gradesj High Ratings and Student Learning 

* J 

\ 

Once the correlation between s^tudent grades and teacher rating is demonstrated 
It becomes necessary to prove that the grades are earned by the student rather 
than given freely by the teacher • Otherwise, the belief In the validity of 
ratings nust collapse* Otherwise^ no teacher can be certain of the degree 
to which student opinion of his knowledge and technique Is colored by one 
aspect7ofhl8s teaching - grading style. 

V 

For 30 years rese^jxhers have been trying with little succesfs to prove a ^ 

connection bat^ween learning an3^ratlngs. There Is a sizeable literature on 

the subject. The studies are no more convincing than those that tried to 

prove that there was no correlation between grades and ratings. There Is 

little need to ainnmarlze here all the studies In the literature that address 

this problem. Neither Professor Eble or anyone else seems to have claimed that 

a connection has been convincingly demonstrated, at least with rating scales 

that give the student an opportunity to state his preferences. 

Several years ago a research study measured student writing Improvement In 
English 101 sections at Harper. Nine of the 16 teachers Included In the 
1973-74 grade-rating studies participated. Four of the nine were high graders, 
five low. Their grading styles have not changed relative to each other since, 
though there has been an updrlft of departmental grades as a whole. In the 
student achievement study, numerous gradlngsS^ere made of paired start-of- 
semester and end- of- semester papers of 600 students. 

The study did not attempt at that time to examine the relationships between 
grades and ratings. It^ focus was only on an attempt to determine If there 
had been student achleV^^nt during the semester, and how much* It was found 
that. In the most successful sections 40% to 50Z of the students were writing 
better at the end than at the beginning. It Is now possible to go back to that 
study to see If there was a relationship between, grading styles and student 
achievement. There was no grading •achievement relationship at all . Of the three whose 



ERIC 



45 



44 



elates showed the most improveioent one la a high grader who scored high In 
the 1973-75 student ratings. The other two had then and have now the lowest 
grading style Indexes In the departmenty Both were\at or near the bottom of 
the evaluation rankings In 1^73-74 and at the lowest deciles department ally and 
Institutionally In 1974-75. Neither placed a class as high as 5.00 In the ratings. 

What of other, more formal, studies? Hemmers and Elliott attec^ted to show a 
connection In l^A9.^and 1950 In the sections of the chemistry assistants, but 
the correlations were difficult to pin down and shlftljig. Russell and Bendlg, 
following the Remmer and Elliott example, in 1953, divided psychology students 
into a plus group consisting of those whose final grades were higher than 
pretests predicted, and a minus group whose grades were lower than predicted* 
They found, as had Remtoers and Elliott before them, that slightly higher ratings 
came from the plus group, but Bendlg, working alone, had found in study 1952«-2 
that the students who got the highest grades on the final exam appreciated 
their teacher^ least. ^ 

Recently more interesting work has been done. Peter Frey of Northwestern University, 
writing in the October, 1973 edition of Science^ tells of a study of 13 calculus 
classes which showed correlations of up to +.90 between student mean section final 
grades and meat section teacher ratings. It was a study that could have been used 
in Table D to show a relationship between grades and ratings, but was rejected 
because its only focus was on student achievement. Grading in the sections was 
by a departmentally prepared final with a departmental cuirve. The individual 
teachers could neither be praised or blamed for being hard or easy graders. Frey 
argues that the high positive correlations between grades and ratings come about 
because his rating form identifies superior teaching. There is a weakness in 
his study in that the ratings were taken by mall after the student knew his final 
grades. They were not anonymous and students with low grades did not respond in 
the same proportion as students with high grades. So there is obviously no way 
of determining whether the ratings resulted from the student's discovery that he 
had made good graded or from his appreciation of good teaching. An Influx of 
ratings from students with low grades might have driven the ratings down. 



4G 



/ 



45 



Thera is however good reason to believe that he has demonstrated a relationship 
between teacher ratings and student learning. His success seems to lie in his 
rating form, which is radically different from those in common use, Frey is 
not in favor of the global rating foiins that meetftire student preference, sudi as 
the Purdue, CEQ and SIR typ^^ that have been popular since 1930, He does not 
use such questions as "Should this instructor be retained if suitable replace- 
ments are available?" that Remmers had on the form he used to rate the qhemistry 
assistants or the "excellent" to "very poor" instructor ratings on the CEQ or 
the percentile rating of instructors on the SIR. Instead, Frey's key question 
asks the student to tell how much work he was required to do - not whether he 
liked doing the work, or whether he liked the teacher who assigned it to him, but 
siirply how much there was. This question combines witk another on clarity of the 
instructor's presentation to produce, according to Frey, positive correlations in 
the nei^iborhood of .90 with, student achieveioent as joeasured by final exams. 
Frey explains it bluntly: lack of clarity in the teadiers presentation can be 
oonpensated for by a heavier student work load; if there is a heavy work load, 
explanations need not be so clear. 

If the Frey scale were used instead of the student preference type now used, the 
two low-rated Harper English teachers mentioned above, both of whom assign latrge 
anjounts of work, mi^t be expected to rise rapidly in the ratings, even to the 
point where they mic^t expect to be considered for promotion. But there is 
little chance that quantitative scales of the Frey type could be adopted in 
teadier merit systems. Setting teachers in competition with each other to see 
how much work they could assign would surely cause enrollments to decline rapidly. 

A global student preference rating scale was used by Arth\ir Sullivan and Graham 
R. Skanes in 1972 at Memorial University of Newfomdland when they found a oor- 
relation between final exam grades and teacher ratings of •SS in 130 sections. 
The final exams, counting 50% of final grade, were departmental ly prepared and 
graded. The sections, all of them in the sciences, math and psychology, worked 
froin structured coatDDon syllabi. Therefore, it would seem that the effect of 
individual grading styles was partially cancelled. The studfc^ unfortunately 
suffers from two weaknesses that make it difficult to point to the stu<$if as proof 
that student achievement and grades are related. First ^ the authors did not 

Er|c 47 " 



46 

undertake the difficult and doxibtful job of "regressing" the final exam score 
to coDDpensate for diffeirences in initial abilities in the different sections; 
second, in a follow-Tjp study of 24 psychology teachers, the groijp that was found 
to have produced the strongest second year psychology students wsus a small groijp 
of low-rated teadiers. , 

The best known rating- achieTrement study is that of Miriam and Burton Ro^flin published 
in Science in Septenber, 1972. They used a global preference rating scale to 
find high negative correlations between the amOTmt atudents learned and their 
rating of teaching assistants. Like the Remners and Elliott studies of 1948 and 
1950, the assistants had little to do with assigning student grades. The hi^ly 
structured organization of the course, the e.xamfl and the methods of grading were 
the creations of the senior lecturer, wlx) tau^t the class three of the five 
sessions eacdi week. He was not rated by the students. The Rodins concluded 
that some of the assistants forced their students to work harder than others, and 
received low ratings as a result, even though their students scored higgler on exams. 
Hence, the negative grade-rating correlations. The study has been vigoro\asly 
attacked by numerous sit>porters of student ratings, among them Professor Eble, 
who says: 

Ignorance continues to appear. Last fall* I was invited 
to Virginia Caranonwealth university to discuss evaljiation 
of teaching. Among the first things that confronted me 
when I arrived weis an article just printed in S cience 
called 'Student EJyaluation of Teachers." The subtitle 
made the claim; 'Students rate most highly instructors from 
whom they learn the least. ' I spent a goo^ part of the 
' afternoon on the health science campus pointing out that 

' > the research that s\3pposedly siqpports this claim was piretty 

shabby even by a humanist's standards. 

J?rofesso^ Eble is on sound grounds, though "ignorance" is perhaps not the term 

to gpply to the stu<5y. The Rodins after all made a number of iirprovements in 

the design that Remroers and Elliott \ased in their work with teaching assistants 

at Purdue, the studies that succeeded in persuading^ large nunfcers of people that 

rating scales were vltLid. The conclusions the Rodins draw are sxpported by their 

data far better than those drawn by Voeks and French, whose woric Professor Eble 

often praises. However, the sanple was inadequate and\he Rodin's conclusions 

should Jbe approached cautiously. 

Er|c 48 



47 



Another atypical sanple is found in the less publicized reyort by Richard Turner 
and Hobert Thonpson, (ERIC EJD09008261 who report that a stu^ of graduate 
stude!i| ^ b eaching 16 sections of Frendi in 1972-72 and 24 septions in 1972-73 
found substantial replicated negative correlations between ptudent performance 
on exams and the students rating of the performance of the graduate assistants. 

The evidence of learning-rating associations is weak. It seems unlikely that 
convincing positive correlations between the amount the studentl learns and the 
rating of instructors will be demonstrated soon if student preference-type 
global ratings continue in use as in th6 past. 

The evidence indicates that the problem in getting the oorxelations between grades 
and ratings is not caused by a lack of student appreciation of teachers vrtio are 
skillful in furthering student learning. It shows that most dt\idents dp want to 
learn do -appreciate, t^eachers who know their subject and can explain it 
clecirly. The problem seems to be that students also appreciate other things in 
addition to learning. ;4>parently there are students sitting in every class* who 
need something else jooxe than they need to leeom the stabject, and their presence 
distorts the class mean and confuses the learning-rating issue. The need for 
praise, for example, is very strong in sane students, as it is in teachers; the 
need for grades in scans others. Sometjines the' need for grades seem to hi so 
strong that it outwei^is all other cx)nsid6rations . A student needing a "B" to 
get a scholarship, or to stay in'school or to tramsfer to another school, or to 
graduate might fdLnd a course a disaster if he gets a "C", even thouc^ he learned , 
a great deeil. - ^ • «i . . 



ERIC 



4:9 



48 



Dlscrepant^ Grades and Scale Validity . ^ 

The jjechanism by whi<^» the grade-rating bias may work has been described in two 
of the studies sxnmaarizted earlier. These are studies 1972-1 (Ba\jsell and Magoon) 
and 1972-5 Ololmes) . Tc^gether the studies suggest that two types of discrepant 
grade eapectauicies eore operating^in the classrooia. Holjoes has shown that students 
who are receiving lower grades than they anticipated may react by rating a teacher 
down in almost all aspects of his teaching tedinique. This can be temed nega- 
tive discrepant grade reaction. It was- found tkat the drop between mean section 
expected grade and mean section final grade was almost twice as severe among low 
grading teachers as hic^ grading in the Harper English study. The hi^ graders 
at Harper gz^ve final mean section grades only a quarter grade lower on the 
average tiian the students esqpected. Low graders gave final grades averaging a 
half grad^ lower. 

Ba\isell and Magoon in their Stu(^ not only detected the negative discrepant grade 
reaction, brut foimd a positive discrepant grade reaction as well. Students who 
were e3?)ecting hi^er grades than the grade point average they brou^t ,to the 
class tended to rate their teachers hi^er than expected. 

The two types of grade discrepancies mi^t., therefore, have influenced the Harper 
«tu^ results. The negative reaction could have occurred when the student found 
he was receiving lower grades than he had ejtpected to receive, or suspected that 
his final grade would be lower. It could also occur \riien he found that he was 
receiving lower grades, or had to woa*: harder for his grade, than his peers in 
•other sections of the same course. 

The present reviewer has seen evidence of the negative discrepant reaction as it 
occurred in an English 101 class during the 1973-74 fall semester. A hic^i-grader 
\^as teaching an uniasually weak section. On a Monday, a week, before the end of 
.the semester, he returned the last of a series of i]i5>ortant tests to the class. 
Most of the students had done poorly. He administered the required faculty 
rating form iattnediately thereafter. When he examined his ratings he discovered 
he had received ratings much lower than he expected. On the following Wednesday, 



49 

he annoxmced that the last test had been cancelled, and scheduled a new 

one. Two days Jater he gave a inudi easier test, and returned the papers on 
the following Monday/ The average student had inproved hfs grade one grade 
level. Utpon taking another teaciier rating innaediately thereafter, he 
<||.scxrvered that his ratings had within a week inproved almost half of a 
rating level, enou^ for him to ieoome a candidate for a 5% salary bonus 
then offered by the Board of Trustees to outstanding teachers. Th«t Schuh 
and Crivelli study (1973-6) describes much the same student reaction. 

The positive discrepant grade reaction^ on the other hand, could occ\ir when 
a student taking, for exanple, English 101, encounters a teacher who gives 
him hi^er grades than he received in hi^ school English or praises his 
papers more than they have been praised before. It is pitissible that such 
a student would not only feel good about hi 3 teacher but mig^it actually 
believe he had learned more thai]^^ unprejudiced before and after testing 
cpuld detect. ^^^^ 

Both types of grade reactions would probably have their strongest effect, in 
multi-sectioned, non-cjxiantitative courses like English, where gradina ^ust 

be largely Ka3ed on the subjective decisions of the 'teacher. The Ho^ifes ^ 

stucfty suggest? , however, that even when grading is done entirely throu#i 
objective exane and the student can blame no one but himsHd§y^ the discon- 
firraing of grade expectations can have a strong effect on ratings. 

Grade differences between those teachers with hig^i ratings and those with low 
means that administrators or peer committees are asked to do an in^possible 
job in interpreting preference rating scales. In looking at high ratings 
they mxast determine if praise or forgiveness in the classroom exceeded the 
boTinds of intellectual honesty, knowing full well that positive reinforcement 
through grades may be the mark of a good teacher. In looking at low ratings 
they need to determine if strict adherence to traditional work load standards 
or to the college's official grading policy is a sign of bad teaching. The 
presence, of a grade effect suggests that teacher evaWation systems based in 
Whole or in part on student preference voting have always been invalid and 
may have lowered the quality of college teaching, not raised it, as sup- 
porters ccxitend. 

51 



50 



When I first started looking into the fringes of the literature about 
Student evaluation of faculty a year and a half ago, 1 was a s\:5)porter 
of the use of student evaluations as an inportant part of a faculty 
evaluation system. ,1 had used them in my own classes for ^Itmst 20 years 
and partly because of them had become a relaxed, permissive, higji-grading 
teacher. I am now convinced that quantified student ratings of the pre- 
ference type, even v^en used privately by the teacher for the avowed pur- 
pose of inqproving instruction and never shown to anyone else, have done 
more harm than gooo^ The problem, seems to be that the need of the students 
to be lovfed, praised and rewarded, and the need of the teacher to be loved,' 
praised and rewarded, and the need of disciplines for thei^r traditions , 
and the need 6f society for standa^Ss, do not quantify together in any 
rational way. 

There are Weaknesses in teaching that (Quantified scales help. to correct, 

but the2;e are strengths they tend to destroy, and^ on balance they seem to 
\ • .* . > 

destroy more than they correct. One thing emerges clea±*ly frcra a close 

study of the literature. The scales cannot discriminate between "good" 

teaching and "bad** teaching, a good reading test used in English classes 

cannot reliably discriminate between the 80th and 90th percentiles in student 

sdbilities, but it discrianinates very well between the 10th and both percentile. 

Not so the teacher rating scales. There is little reason to believe that 

those English teachers who rank above the 90th percentile in ciaraulative student 

preference ratings are "better" teachers than those \fho rank below the 10th. 



52 



51 

TABLE A ^ . 

English Teacher Grading Styles-. • 
* Based on Average Maan Section Grades 

tl) (2) M (3), (4) (5) 

% Of % Of 

• A's A's 

Grading Style PALL Fall 

fALL, 19^3, . FALL, 1974* Index 1973 1974 

"^'*s^ High Graders 



3.14 • 


3.16 


tl) 


3.15 


31 


29 


3.02 


2.75 


(2) 


2.92 


39 


26 


3^01 


2.84 


C3) 


2.88 


29 


22 


2.92 


2.72 


C4) 


2.82 


23 


18 


2.63 


2.78" 


(5) 


2.70 


38 


44 


2.74 


2.64 


(6) 


2.69 


13 


27 


2.66 


' • 2.66 


(7) 


2.66 


30 


31 


2.40° 


2.76 


C8) 


2.58 


24 


29 



Lw Graders 



2,40 

2^1 ^ 
2.21 

unavailable 
1,92 

2,08 ♦ 

2.06 

2,17 



2.56 


C9) 


2.48 


7 


12 


2.26 


. tioi 


2.39 


• 19 


8 


2.27 ^ 


(11) 


2.27 


, 10 




2.13 


" (12) 


2.13 


X 


21 


2.26, 


C13) 


2.09 


- y . 


14 


2.05 


Cl4i 


2.a7 


6 


9 


2.06 


(15) 


2.Q6 


7 


12 


1.71 . 


a6) 


1.94 


16 


8 



2,53 2,48* 2*49 



/ 



* 1973 grades: Af4, B?=j3, 02, T^l^ E^O 

1974 grade^: ^^Af4, &=3, 0=2, D^l, P«0, N=t) 

The "N" grade had the effect of lowering teacher grade point averages an unknown 
amount, since some teachers used it to replace F's, inconpletea and unofficial' 
withdrawals. The latter two were not ccciputed in the 1973 averages. 



TABLE <B 



Teacher Eveiluation Means in 35 
English Sections 
. (6 Point Scale: IVP^ 2P, 3F, 4G, SVG, 6EX) 



Section Scores (1), 
of High Graders 

5.81* . 
5.73* 
5.62* 
5r57 (2) 
5.47* 



5.40 
5.38 
5.38 



Section Scores 
1 of Low Graders 



(1) 



^-Median 



5 

5.31- 



5 . 22 Mean 



5.17 



5.14* 

5.00* 

4.89 
4.79 



4.56 



4.21- 



TUT" 



5.29' 

5.29 

5.27 

5.18* 

5.15* 
5.15* 

5.08* 
5.00 

5.00 MedLun 



•Ctourses marked with aSteriska 
required conpositioii courses. 



4.71* 
4.63 

4.27 

4.06* 
4.06 
3.89* 
3.75 



Mean 4.78 



021 



are fin^lish 101 • , 

.Those not marked are ^ 
litsretture courses or other electiyes. t 

(1) BMed on two year curaulative grading style indeX'-- average grades given to all 
students in two semesters a yeea: jgoart. The grading style index range of the 
8 high graders was 2.58 to 3.15 on a 4 point grading scale. The index for the 
O 8 low gradss was 1.94 to 2.48. 
ERXO) Average grading style index of 4 outstanding sections, 2.91; of 4 worst 
sections, 2.06... 



TABLE C 

Rank Orders From the Section Giving Each of l6 Tbachers His 



53 



or Her Highest Rating In PaU, 197^1^ 



(1) 




(3) 



(^) 



(5) 



(6) 



Stu^yft Rating 
ofwp^ructors 

Section Mean 



Raiting 

Dept« 

Rank 



Teacher IlA«x Section Pinal Pinal G?rade 
Gliding Styl!5 Dept» Grade Meani Dept» 
Index Rank (A-W) Rank 



V 




2.39 
2.58 
2.48 
2.09 
1.94 
2.13 
2.06 



5 - 



6 

3 
1 

2 
14 

-5 
. 7 

11 

10 
8 
9 
13 
16 
iT 
15 



Spesonan Correlation! 



2.73 
2.96 
3.28 
3.33 
2.56 
3.08 
2.76 
2.88 
2.71 
2.82 
2.52 
2.68 
2.39 
1.74 
2*26 
,2.05 



8 
4 
2 
1 
XI 
3 
7 
5 
9 
6 

12 
10 

13 

16 

15 



Oolunn fe with Coliuin 4^ - + .75 
Column 2 with Colunn 6 - + .79 

Both sigjiificant beyond the .01 level. 



1 



ERIC 



V 



55 



TABLE D N 54 

Correlations: Grades to Ratings 



Coiqprehenslve Listing 




Association 


Association 


Association 


of All Available 1930-74 


■ 


Judged 


Judged Low 


Judged Marked 


Published Research 


Sample 


NeRllRlble 


to Moderate 


or Important 


(1) 


(2) • 


(3) 


(4) 




^1930 (Renmers) 


11T,409R 


+.07 






1934 (Starrak) 


40,000R 


+;15 






1936^1 (Hellman & Armentrout) 


*46T. 


-.04 






1936-2 (Blum) 


IT, 57R 


' Nil 






1949 (Remners-^et^al) 


37T 


+.13 to .35 




> 


1950-1 (Elliott) 


Unstated 


+.03 






1950-2 (Elliott) 


3786R 


Unstated 






1951 (Hudelson) 


192R 


+.19 






1952-1 (Voeks & French) 


34T 


Nil 






1952-2 (Voeks & French) 


20T 


Nil 7 
Nil / 






1952-3 (Voeks & French) 


16T 






1953-1 (Bendlg) 


5T 


* +.14 to .28 






1953-2 (Bendlg) 


5T 


High Neg. 






1953-3 (Anlkeef) 


*19T 






+.73 Merit 


1954 (Clark & Keller) 


15,000R 


Unstated 






1960 (Weaver) 


*12T, 699R 






+.001 level 


1962 (Garverlck & Carter) 


IT, 164R 


+.08 






1964 (Enchandlal 


16T 




+Un8tated 




1965-1 (Spencer & Dick) 


600R 




+Diistated 


+.85 to .91 


1965-2 (Spencer & Dick) 


*12 Sec. 






1966-1 (Overturf & Price) 


10,000R 


+.17 Merit 






1966r2 (Stewart & Malpass) 


*67T 






+.001 level 


1969-1 (Walker) 


30T 




+Unstated 


V 


1969-2 (Caffrey) 


3T, 131R 


+.23 to .32 


+.09 to .44 


• 


1970 (Rubensteln & Mitchell) 


*60 Sec. 






1971-1 (Holmes) 


7 Sec. 




+5Z of Var. 


+.01. level 


1971-2 (Wlegel et al.) 


4T, 331R 






1971-3 (Hlldebrand) 


1015R 


+.09? 




+.53 to .63 


1972-1 (Bausell & Magoon) 


*12,000R 






1972-2 (Nichols & Soper) 


*339 Sec. 






+.53 Merit 


1972-3 (Kelley) 


IT, 25 8R 


+.02 of Var. 






1972-4 (Kennedy) 


549R 




+Unstated 




1972-5 (Holmes) 


IT, 97R 






+yarious • 


1973-1 (Rosenshlne et al.) 


*1200 Sec. 




+.09 to .27 


+.26 to .78 ^ 


1973-2 (Perry & Baumann) 


*900R 






fCentra & Linn) 


300R 




+Unstated 




1973-4 (Mlrus) 


*122 Sec. 




+.14 to .21 


+.a5 Srlt 


1973-5 (Granzln & Painter) 


*17 Sec. 


• 




1973-6 (Schuh & Crlvelll) 


IT, 




+11* or var. 


+.001 level 


1974-1 (Comwell) 


*70T 




+.10, 30% lies 


1Q7A..0 ^1loaQ-fn^ 

l!7/H— 4 voassxn^ 


*64T 






Comparison 








+.73 


1974 Harper (Powell) 


18T 






1975 Harper (Powell) 


16T, 35 Sec. 




/4-.43 to .79 



T-Teachers R-Ratlngs Sec. -Sect Ions Var. -Variance Level-Level of Confidence 
Merit: The author Indicates that ratings were used to decide pay, promotl^Sj, etc. 

* An asterisk Indicates the report met minimum research and Research reprtirtlng 

requirements: typicality and size of saTi?>le, design,* teportlng of data, etc. 



APPENDIX E 



55 



A Statistical Note 

Two statistical symbols are used In this paper. The first, correlation- 
coefficients, are estimtates, reached through standardized algebraic 
formulas, of the degree of association between two or more sets of figures* 
They are stated In terms of departure from zero correlation (ft. 00). 
Perfect positive correlation is +1.00, perfect negative, -1.00. The , * 
+ sign is usually omitted before positive correlations. A perfect 1.00 
correlation would be produced if when grading on and being rated on a 
five point scale, the teacher received all fives from his A^students, an 
average of four from his B students, an average of three from his C students, I 
etc. A correlation of .50 would probably occur if a teacher received 
average ratings of 3.50 from C students, 4.00 from B students, 4.50 from 
A students. Average reatlngs of 4.10, 4.20, and 4.30 respectively would 
produce a correlation of .10. Eqxial negative correlations would occur if 
grades were Inversely related to ratings. They would be of eqtial slgnlfi- 
cance^ Obviously, .79 is a strong showing, .10 a weak one. 

The second type of statistical symbols, level of significance notations, 
are estimates of how likely it is that the results found occurred by chfnce. 
They are expressed as percentages. A .05 level of significance means that 
the results might occur by chance alone five times i^ a hundreif .01 in- 
dicates one chance in a hundred; .001, one chance in a thousand. It's 
important to remember that a high level of confidence does not necessarily 
mean that a high correlation is present if 'the sample is large. A small 
sample, on the other hand, must produce higher correlations before signifi- 
cance can be claimed. Several researchers, finding interesting correlations 
of .35 to .45 between grades and ratings have dismissed them as( non-signifi- 
cant because they were working with small samples. A correlatlAii ''of .49, for 
example, is needed to claim significance at the .05 level when oWls working 

with a sample of 16 teachers. It's also important to keep in mind that some^^^ 

researchers, having found statistically significant but small correlatloW ^ - 
of .10, .2Q, and .30 have dismissed them as negligible, trifling, or slight;. 
In doing so they are exercising statistical judgment, which may or may^^^t-^.^ 
be sound. Statistical practice is Involved but not rigid statistical law>-.^ 



UNIVERSITY OF CALIF. 
LOS ANGELES 

FEB 2 7 1976 

CLEARINGHOUSE FOR 
U.UWIOR COLLEGES 



